Apache Spark(PySpark) Matrix Factorization 최적화하기

Data

Apache Spark(PySpark) Matrix Factorization 최적화하기

Kaden Sungbin Cho 2021. 5. 15. 15:55

이번 글에서는 데이터 엔지니어로 근무하며 진행한 MF 최적화 작업을 바탕으로, 최적화 시에 어떠한 관점으로 접근했는지와 관련 자료를 정리해보려 합니다.

Reference

[1] Advanced Apache Spark Training - Sameer Farooqui (Databricks)

[2] Tuning Apache Spark for Large-Scale Workloads

[3] SOS: Optimizing Shuffle I/O

[4] Deep Dive: Apache Spark Memory Management

[5] Matrix Computations and Optimization in Apache Spark

[6] Getting The Best Performance With PySpark

[7] Apache Spark @Scale: A 60 TB+ production use case

[8] Implementing Large-Scale Matrix Factorization on Apache Spark

[9] Optimizing Apache Spark SQL Joins

[10] Optimal Strategies for Large-Scale Batch ETL Jobs

[11] Tuning Spark

[12] Tuning Spark application tasks

[13] Troubleshooting and Tuning Spark for Heavy Workloads

[14] Why Your Spark Apps Are Slow Or Failing, Part II: Data Skew and Garbage Collection

[15] Tuning G1 GC for spark jobs

[16] How do I get a cartesian product of a huge dataset?

[17] https://www.slideshare.net/databricks/scaling-apache-spark-at-facebook

저작자표시 비영리 변경금지 (새창열림)