MLflow for the platform

MLflow is an open-source platform for managing the end-to-end machine learning lifecycle. It provides tools to track experiments, package code into reproducible runs, and share and deploy models. The platform provides a JupyterHub environment with MLflow pre-installed, allowing users to track and manage their machine learning experiments. The platform provides an shared MLflow instance that can be accessed by all users with the following URL:

http://idoml.mlflow.{IDOML_DOMAIN}

This shared instance is configured to store the experiment data in the MinIO server, which is also accessible to all users.

Tracking Experiments

The jupyterhub environment is preconfigured to access the shared MLflow instance. Users can start tracking their experiments by creating a new experiment and logging the parameters, metrics, and artifacts. Here is an example of how to start tracking an experiment in MLflow:

import mlflow

mlflow.set_experiment(MLFLOW_EXP_NAME)
with mlflow.start_run(run_name=RUN_NAME):

    mlflow.log_param(PARAM, PARAM_VALUE)
    mlflow.log_metric(METRIC, METRIC_VALUE)

    mlflow.sklearn.log_model(MODEL, artifact_path='PATH/TO/MODEL')

Accessing the external MLflow server

Sometimes users may want to access an external MLflow server. In this case, users can provide the necessary configuration to access the external MLflow server. Here is an example of how to access an external MLflow server with full configuration:

import mlflow
mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)

with mlflow.start_run(run_name=RUN_NAME):

    mlflow.log_param(PARAM, PARAM_VALUE)
    mlflow.log_metric(METRIC, METRIC_VALUE)

    mlflow.sklearn.log_model(MODEL, artifact_path='PATH/TO/MODEL')