BridgeAI MLOps Knowledge Hub

Team MLOps Architecture

The Team's Architecture Mural:

Click on a number icon in the image for a brief description of the process associated with it.

Components

The following explanations apply to the team’s MLOps architecture:

1. Model Development

Data scientists develop models locally.

2. Models pushed

All models pushed from local machine are stored in the artefact store in MLflow. The store itself is an S3 bucket, where for each model a .pth file, evaluation metrics, hyperparameters (α,β,etc) and Uniform Resource Identifier (URI) are stored.

Data format: .pth
Data visibility: High
Frequency of data flow: On demand

3. Model commit

One chosen algorithm, parameters, data and model development code is finalised and checked into the main branch and baselined.

Data format: .pth, .csv

4 - 6. Model Registry and Store

As models move through MLflow Store:

Data visibility: High
Frequency of data flow: On demand

4 - The best model is manually chosen to be promoted, and tagged as None.

5 - Manual compliance checks are made, and model is then tagged as Staging.

6 - Model tagged manually as Production based on Comparison / AB testing.

7. Manual push to model server

Model manually pushed to Model Server to serve in Production

Frequency of archiving of data: On demand

8. Prediction service

Prediction Service is ready to serve users.

Data format: URI
Data visibility: Low
Frequency of data flow: Real-time

9. User input and predictions output

As user interacts with prediction service, their input is returned to them.

Data format: JSON
Data visibility: Med
Frequency of data flow: Real-time

10. User interaction data captured

User interaction data is captured and stored in logs database in S3.

11. Model monitoring for decay

Model monitoring setup checks model and data for drift or decay.

12. Scheduler initiates training process

13. DAG initiates docker instance to be run

14. Data retrieved from S3 bucket

The data for training, testing and evaluation is retrieved from Amazon S3 bucket.

Data format: .csv
Data visibility: Low
Frequency of data flow: Based on scheduler or on demand, depending on whether training process is manually retriggered later on.

15. Data is preprocessed and transformed

Data format: .csv
Data visibility: Low
Frequency of data flow: Based on scheduler or on demand, depending on whether training process is manually retriggered later on.

16. Features stored

The output of preprocessing is stored in feature store

Data format: .csv.dvc
Data visibility: Low
Frequency of data flow: Based on scheduler or on demand, depending on whether training process is manually retriggered later on.

17. Feature store fetch

Data from feature store is fetched for training and evaluation

Data format: .csv.dvc
Data visibility: High
Frequency of data flow: Based on scheduler or on demand, depending on whether training process is manually retriggered later on.

18. Model created

Model is created and trained against test data using metrics

Data format: .csv
Data visibility: Low
Frequency of data flow: Based on scheduler

19. Model version storing

Each model version is stored in the database within the model store

Data format: .pth
Data visibility: High
Frequency of data flow: Based on scheduler

20. Model monitor pull

Model monitor pulls new data periodically from S3 and investigates it for decay

21. Potential retrigger

Upon identifying model decay, Data Scientists manually retrigger the retraining process.

22. Industry Data

Simulated industry data is supplied to the pipeline on-demand