Skip to the content.

Training Pipeline

Table of Contents

  1. Introduction
  2. Evaluation
  3. Team Implementation
    Apache Airflow
    Prefect
  4. Resources


1. Introduction

This page examines the team’s evaluation of two tools for training pipeline implementation, being Airflow and Prefect. Links to repositories for each sub-component of the team’s training pipeline are provided; these repositories come with instructions for implementing them.

The team chose Airflow as its DAG implementation software because it is:

On the other hand, to run Prefect the official Helm chart requires additional configurations to be setup: Welcome to Prefect

Note: Kubeflow does not have an official Helm chart.


2. Evaluation

Apache Airflow

📓

Airflow is an open source workflow orchestration tool used for orchestrating distributed applications. It works by scheduling jobs across different servers or nodes using DAGs (Directed Acyclic Graphs). A DAG is the core concept of Airflow, collecting Tasks together, organised with dependencies and relationships to say how they should run.


Features

Prefect

📓

Prefect decreases negative engineering by building a DAG structure with an emphasis on enabling positive with an orchestration layer for the current data stack.


Features


3. Implementation

Each of the links below will direct you to one of our repos for each process, which comes with a README to direct you on how to set up each process:

➡️ Testing DAGs on Local Kind Cluster

➡️ Data Ingestion DAG

➡️ Model Training DAG

➡️ Drift Monitoring DAG


Resources

  1. Airflow

  2. Prefect

  3. Helm - What is a Helm chart

  4. What is a DAG