Scaling a Mature Data Pipeline — Managing Overhead
- Over time, teams end up encoding application structure in the data pipeline. Application logic gets coupled with orchestration logic.
- Orchestration complexity causes overhead. This complexity scales with the depth of the data pipeline.
- When you decouple orchestration logic from application logic, you get tools to fight the overhead, without compromising the quality of the application.
- When trying to reduce the run time of a data pipeline, analyze the whole pipeline’s execution time, not just the obvious factors like map-reduce computation time.
- Focus on fault tolerance considerations.
Full post here, 11 mins read