Model Monitoring
Table of Contents
- Model Monitoring
- Data Drift with Evidently Explained
- Data Drift for the House Price Prediction Use Case
- Resources
Introduction
This page has been designed to walk users through key concepts in model monitoring, and how to use EvidentlyAI to implement your model monitoring component as part of your MLOps pipeline.
The team chose EvidentlyAI as its model monitoring tool for similar reasons to the decision to use Airflow as the training pipeline tool - its wide community use, and its ease of setup.
The repository, containing steps for the implementation of the model monitoring component, is linked here.
1. Model Monitoring
Building a machine learning model is just the beginning. Once you deploy that model into the real world, it faces many challenges that can affect its predictive performance (Model Drift) and require continuous monitoring.
Model Drift refers to the decay of the ML model quality over time. Simply put, it is a way of saying “the model quality got worse” or “the model no longer serves its purpose.” Model Drift doesn’t pinpoint a specific cause; it’s just an observation that the model no longer works as well as it used to.
The model decay might happen due to various reasons, including data drift, data quality issues, or concept drift. What is concept drift in ML, and how to detect and address it
Below are the 2 Model Drifts discussed widely by “many industry experts”.
- Data Drift
- Concept Drift
Note:
-
To establish a monitoring strategy refer to the Establishing the monitoring strategy section from the link: Model monitoring for ML in production: a comprehensive guide
-
Model monitoring architectures refer to the Model monitoring architectures section from the same link
1.1 Data Drift
Data drift is a change in the statistical properties and characteristics of the input data. It occurs when a Machine Learning model is in production, as the data it encounters deviates from the data the model was initially trained on or earlier production data.
Further information on what data drift is in ML, and how to detect and handle it, is linked.
Data Drift Detection methods:
-
Summary statistics
-
Statistical tests
-
Distance metrics
-
Rule-based checks
1.2 Concept Drift
Concept Drift implies a change in the learned relationships between the input features. Model Drift is often caused by Concept Drift.
Information on detecting and addressing it, is linked.
1.3 Target Drift
Occurs when the goal or target of your prediction changes.
2. Data Drift with Evidently Explained
Two datasets are used to evaluate whether there is a Data Drift.
-
Reference data refers to the dataset that serves as the baseline or standard for comparison. This is typically the dataset that the Machine Learning model was trained on or a dataset from a previous period that is considered stable and representative of normal conditions.
-
Current data refers to the new or recent dataset that is being evaluated for drift. This is the dataset that has been collected more recently, possibly after the model was deployed or during ongoing operations.
2.1 Steps
1. Load the Reference data (depending on the use case, if dataset is huge, it can be a subset or samples drawn from the dataset) as a DataFrame
2. Load the Current data (depending on the use case, if dataset is huge, it can be a subset or samples drawn from the dataset) as a DataFrame
3. Data Drift can be chosen for individual features (numerical as well as categorical features) or for all the features in the dataset used in Step 1 and Step 2.
4. Choose a statistical test by which data drift needs to be analysed such as 'chisquare' 'jensenshannon', 'wasserstein' etc (list of all statistical tests are provided here - https://docs.evidentlyai.com/user-guide/customization/options-for-statistical-tests). The statistical tests can be chosen for individual metrics as well.
5. Use any of the Data Drift methods such as DatasetDriftMetric, this metric provides a overall summary of the drift detected from the comparison between the Reference data and the Current data.
6. Method “DataDriftTable“ generates a table which compares drift between individual features of the Reference data and the Current data.
7. The results from the report generated can be saved as a json or a html file.
8. Evidently also supports creating your own statistical test and using it in the drift detection methods.
Note:
How much of data needed in the Reference data and the Current data? (Refer to Evidently’s documentation)
-
Depends on the use case
-
The decision taken by the DS’s and the business
-
The trial and error analysis of the outcomes of Model Monitoring
-
The decision taken can be refined once again based on the outcomes as it is an iterative process
2.2 High Level Process
In general an organisation who wants to analyse drift would have to:
-
Have data continuously flowing from the upstream, i.e. daily/weekly/monthly.
-
Have prior to model monitoring implementation, depending on the use case have a detailed discussion as to how model monitoring is to be performed, what drift detection has to be implemented, what input features are required for analysis.
-
Have a plan on when to schedule Model Monitoring i.e daily/weekly/monthly
-
Have the action item once a drift has been detected.
-
The above is not an exhaustive list, these change as per use case.
2.3 Key Questions before Retraining
-
How much overall Data Drift is acceptable?
-
How much Data Drift for each individual features is acceptable?
-
Before Retraining with the new data, can data be manually labelled so that we can test the model’s accuracy/error rate?
A comprehensive evaluation of these questions, is linked.
Note:
Retraining is required when there is a significant Drift detected and how sensitive will be the model’s prediction if no action is taken when a Drift is detected.
2.4 Possible Action Items based on Data Drift detection
-
Update the model i.e training the model with old data (can drop some of the historical data) + new data (delta) without changing any of the model’s algorithm or the hyperparameters used. Evaluate the performance of the newly created model against the validation and the test data.
-
Retrain the model i.e training the model with old data (can drop some of the historical data) + new data (delta) by experimenting with new set of hyperparameters or changing the algorithm used to build the model itself. Evaluate the performance of the newly created model against the validation and the test data.
Note:
The above 2 training points mentioned above are performed in the local/Dev environment for the Deploy Model solution.
2.5 Testing before Pushing a Newly Trained Model to Production
-
Prepare the test dataset ideally which is a combination of old model’s test data + new data (neither of the model should have seen this data during training). Use the same test dataset to evaluate the performance of the newly trained model and the existing production model.
-
If the accuracy/error of the existing production model’s performance on the test dataset is not good, then the newly trained model could become the production model.
-
The existing production model is archived.
An evaluation on key metrics to assess before transitioning to production, is linked.
3 Data Drift for the House Price Prediction Use Case
-
Make sure the Reference data and the Current data (Section 2.1) is available in S3/any storage location
-
Manually trigger the Data Drift DAG from Airflow.
-
Based on the Drift Report that is generated, a DS analyses the Drift Report and decides whether to Update or Retrain the model manually (as discussed in Section 2.4).
Examples of implementations of Data Drift is given here and here.
Note: you may also wish to to the FAQs section from the links provided above.
Resources
Model Monitoring, Data Drift & Concept Drift
Evidently AI - A complete guide to ML in production
Drift Detection
Data drift parameters - Evidently Documentation
Data drift algorithm - Evidently Documentation
Statistical Tests Explained
Data drift parameters - Evidently Documentation
Which test is the best? We compared 5 methods to detect data drift on large datasets
A Comprehensive Guide on How to Monitor Your Models in Production
Mitigation after Drift Detection