In the ramp-up phase, Faststream Technologies met with the project’s business sponsor and the software development organization to more explicitly define the project’s business and technology objectives. We helped our client translate business objectives into a product specification.
(1)DATA ENGINEERING AT SCALE
We audited our client’s data to better understand their data sources, quality, and resolution. The bulk of the ETL effort involved merging multiple data sources in varying formats from the client’s data lake. We devised a data engineering strategy that sourced terabytes of data—including real- time streaming sensor data, hardware-specific demographic information, human-generated maintenance reports, and external weather data.
(2)DATA PREPARATION
After assessing the data and defining predictable targets, we performed feature engineering on the data stream to create appropriate inputs for the time-series forecast problem. We varied the look- back and prediction horizon windows, and carefully created training and validation data sets so as to avoid data leakage.
(3)MODEL ENGINEERING
We followed our standard process to evaluate multiple model architectures, from logistic regression to tree-based ensemble techniques to neural networks. We used convolution neural networks without feature engineering as a baseline for accuracy but settled on two classes of tree-based models (random forests and gradient boosted trees) because they demonstrated better performance, were easier to tune and were also more interpretable. We worked with the client’s engineers to define expert features (e.g., pressure and temperature ranges), to optimize the model accuracy, and to interpret the output of the model (e.g., feature importance)
(4)BUSINESS INTEGRATION
After optimizing the hyperparameters for a family of models, we created a prediction job that updated a database with daily predictions.
Read More