Telenav has built a sophisticated machine learning framework using Cascading. Complex analytics algorithms were broken down into a collection of basic data flows and integrations. Developers build upon these reusable process flows to quickly develop new applications.
Cascading is the core component of Trulia’s data processing pipeline for data cleansing, entity resolution, and metadata extraction. This helps Trulia provide higher quality content with the newest listings and the best deals in their customers’ markets.
Cascading provides data scientists at The Climate Corporation a solid foundation to develop advanced machine learning applications in Cascalog that get deployed directly onto Amazon EMR clusters consisting of 2000+ cores. This results in significantly improved productivity with lower operating costs.
Etsy runs over 50 Cascading applications daily to study customer behavior and product sales. Programming in JRuby, Etsy can quickly test and create new applications on its e-commerce site that helps it acquire new customers and sell more products.
Airbnb uses Cascading because it provides developers more control when conducting advanced data analysis workflows, data normalization and cleansing. With Cascading applications are easier to test and developers are more confident that applications will work.
Twitter has invested heavily in making Cascading a key component of their data analytics infrastructure. Cascading enables Twitter engineers to create complex data processing workflows in their favorite programming languages easily as well while providing the scalability to seamlessly handle terabytes and petabytes of data.