It provides concrete techniques for handling common headaches like key skew, choosing the right join strategy, and optimizing RDD transformations.
Intermediate to advanced Spark users. It is not a beginner’s guide; readers should already be familiar with Spark's basic architecture or have read foundational texts like Learning Spark .
If you’re tired of seeing "Out of Memory" errors or watching your cloud costs skyrocket, this is the definitive manual for "making Spark sing". It is an essential desk reference for anyone serious about production-grade big data pipelines. High Performance Spark: Best Practices for Scal...
If you don't understand the basics of distributed computing, you may find the technical depth overwhelming.
Writing high-performance code using the Spark SQL and Core APIs. It avoids the "black box" approach by explaining exactly how data is distributed and joined under the hood. Key Strengths If you’re tired of seeing "Out of Memory"
This book bridges the gap between "making it work" and "making it scale". Authors Holden Karau and Rachel Warren—later joined by Adi Polak for the updated edition at Amazon —provide a deep dive into Spark's internals to help you write code that is not only faster but also more resource-efficient.
While the primary examples are in Scala, the concepts are highly applicable to PySpark users, especially with the second edition's expanded focus on Python-JVM data transfer. Cons to Consider Writing high-performance code using the Spark SQL and
Unlike many high-level guides, this book explores Spark’s memory management and execution plans , helping you understand why certain configurations fail.
@article{wang2021mlfw,
title={MLFW: A Database for Face Recognition on Masked Faces},
author={Wang, Chengrui and Fang, Han and Zhong, Yaoyao and Deng, Weihong},
journal={arXiv preprint arXiv:2109.05804},
year={2021}
}
This database is publicly available. We provide: 1) the original images(250x250), 2) the aligned images(112x112) and 3) the pair list. Baidu Netdisk(code:328y) , Google Drive
Now, we provide a list to indicate the masked faces. Google Drive