Concept
The IDOML platform is designed with three primary user roles, each with distinct permissions and responsibilities tailored to ensure efficient, secure, and user-friendly interactions within the system. These roles are fundamental to the platform’s structure and are designed to cater to the unique needs of each user group.
- System Administrator
- Project Manager
- Data Scientist
The below diagram is a high-level illustration of the roles and responsibilities of each user group:
Main Roles
System Administrator
The system administrator is responsible for setting up and maintaining the IDOML platform. This includes installing and configuring the necessary software components, such as JupyterHub, Apache Airflow, and Minio. The system administrator is also responsible for managing user accounts and permissions, monitoring system performance, and ensuring the security of the platform.
IDOML provides a Docker Compose file that automates the installation and configuration process, making it easy for system administrators to deploy the platform. The Docker Compose file includes all the necessary software components and configurations, so system administrators do not need to manually install and configure each component.
Project Manager
The project manager is responsible for managing the machine learning projects on the IDOML platform. This includes creating and managing projects, assigning tasks to data scientists, monitoring project progress, and ensuring that projects are completed on time and within budget. The project manager is also responsible for coordinating with the data scientists to define project requirements, develop project plans, and track project milestones.
Data Scientist
The data scientist is responsible for developing and deploying machine learning models on the IDOML platform. This includes data exploration, model development, model evaluation, and model deployment. The data scientist uses the JupyterHub server to develop and test machine learning models, and the Apache Airflow server to deploy and manage machine learning pipelines.
The IDOML platform provides an Elyra extension to JupyterLab that allows data scientists to create Directed Acyclic Graphs (DAGs) graphically. The DAGs define the workflow of the machine learning pipeline, including data preprocessing, model training, model evaluation, and model deployment. The Elyra extension simplifies the process of creating and managing machine learning pipelines, making it easier for data scientists to develop and deploy machine learning models.
The goal of this platform is to facilitate the development and deployment of machine learning models from experimentation to production.
- The experimentation phase involves data exploration, model development, and model evaluation which are designed to be done in the Jupyterhub server.
- The production phase involves model deployment, monitoring, and management, with which the executions are designed to be managed by the Apache Airflow server.
Platform Architecture
The IDOML platform is designed to provide an end-to-end solution for developing and deploying machine learning models. The platform consists of several components that work together to facilitate the machine learning workflow. The main components of the platform are:
- Applications:
- JupyterHub: Data exploration, model development, and model evaluation
- Apache Airflow: Model deployment, monitoring, and management
- Minio: Data storage and model deployment
- MLflow: Model tracking and management
- Administration:
- Keycloak: User authentication and authorization
- Traefik: Reverse proxy and load balancer
These components are open source project and can be deployed using Docker containers, making it easy to install and configure the platform. The platform is designed to be scalable and flexible, allowing users to customize the components to meet their specific needs. The platform architecture is shown in the following diagram: