OBJECTIVES
- A national aluminum and copper manufacturing company wanted to improve efficiency in their coal utilization. The standard test process required several days to estimate calorific value, worsening the inventory quality and creating production delays.
- We created a unified logistics and transportation data warehouse and built a GCV prediction engine on top that was deployed to the client’s servers, thus enabling instant decision times and improved consumption efficiency.
SOLUTION
- The primary difficulty lay in pinpointing the key parameters affecting the GCV of coal. Real-world complexities of logistics and operations created multifaceted relationships between coal attributes.
- Significant manual data entry meant inconsistencies abounded, with missing entries and mismatched types. Rigorous cleaning and validation became essential to maintain the integrity of our analysis.
- The client’s tech stack assessment was still ongoing. It was therefore required to design a flexible solution within the tech constraints but without hindering the roadmap for integration and rollout.
BENEFITS
- Intensive EDA performed with theory and analytics to identify the major parameters affecting the GCV.
- A robust data pipeline was designed, adding validation checkpoints to handle issues like missing values and mismatched entries, to handle the high % of manual entries.
- After several iterations, an Xgboost based GCV prediction model was deployed with 95% accuracy and using Boruta Algo for feature selection
- Modularized architecture for data flow and calculation were designed to ensure swift re deployment in case of changes to the tech stack.