-
[Reading List] 데이터 엔지니어링 관련 논문 (Articles on tools for the Data Engineering)Learning Resources 2020. 11. 23. 18:47반응형
도구
Albis
Apache Calcite
Apache Hadoop Distributed File System
Apache Hadoop Mapreduce
- MapReduce: Simplified Data Processing on Large Clusters
- Parallel MapReduce: Maximizing Cloud ResourceUtilization and Performance Improvement Using ParallelExecution Strategies
- Parallel Data Processing with MapReduce: A Survey
- MRTuner: A Toolkit to Enable Holistic Optimization forMapReduce Jobs
- What are Distributed Execution Engines? Characteristics and types of Distributed Execution Engines
Apache Hadoop Yarn
- Apache Hadoop YARN: Yet Another Resource Negotiator
- Mitigating YARN Container Overhead with Input Splits
Apache Hive
- Hive - A Warehousing Solution Over a Map-Reduce Framework
- Improving the performance of Hadoop Hive by sharing scan and computation tasks
Apache Spark
- Resilient Distributed Datasets: A Fault-Tolerant Abstraction forIn-Memory Cluster Computing
- Matrix Computations and Optimization in Apache Spark
- Pipelined execution of stages in Apache Spark
- Architectural Impact on Performance of In-memory Data Analytics: Apache Spark Case Study
- A Benchmarking Study to Evaluate Apache Spark on Large-Scale Supercomputers
- Performance Comparison of Spark Clusters ConfiguredConventionally and a Cloud Service
- Fast Data Processing with Spark
Apache Tez
- Apache Tez: A Unifying Framework for Modeling andBuilding Data Processing Applications
- Analyzing performance of Apache Tez and MapReduce with hadoop multinode cluster on Amazon cloud
Borg
- Large-scale cluster management at Google with Borg
- Borg: the Next Generation
- Borg, Omega, and Kubernetes
CIEL
Crail
Delta Lake
Diff-Index
Dremel
Druid
Espresso
F1
FusionInsight LibrA
Kafka
- Kafka: a Distributed Messaging System for Log Processing
- Building LinkedIn’s Real-time Activity Data Pipeline
Kudu
MemoDyn
- MemoDyn: Exploiting Weakly Consistent Data Structures for Dynamic Parallel Memoization
Mesa
Millwheel
Monarch
Naiad
NAPA
PacificA
Poster Paper
Pregel
Presto
Ray
- Ray: A Distributed Framework for Emerging AI Applications
- Ray: A Distributed Execution Engine for the Machine Learning Ecosystem
Scuba
Spanner
StreamBox
SWIFT
Ubiq
Wrangler
Zanzibar
일반적 사항들
Data center
Data Model
반응형'Learning Resources' 카테고리의 다른 글
[Learning Resources] API Security (0) 2022.01.15 [Learning Resources] Cryptocurrency (0) 2021.09.26 [Reading List] Articles on Containerization & Resource Scheduling (0) 2021.08.30 [Reading List] Articles on Web Server Performance (0) 2020.11.23