Skip to content

IDOML worker

Welcome and please checkout the IDOML worker node repo!

A worker is a machine that performs the actual machine learning pipeline scheduled by the Apache Airflow server. The IDOML server already comes with a default worker node, but you can scale up the number of workers by deploying additional worker nodes based on custom requirements.

Concepts

Requirements for deploying an additional worker node

  1. Hardware Requirements:

    • A Linux server is sufficient to run the worker node. As the worker node is responsible for running the machine learning tasks, the hardware requirements depend on the complexity of the tasks and the size of the data. Please ensure that the server has sufficient resources (including GPU devices) to handle the desired machine learning tasks.
  2. Check and ensure the connectivity from worker node to IDOML host server:

    The worker node must be able to connect to the IDOML server. Ensure that the worker node can access the IDOML server's services, especially the postgres and redis server from apache airflow.

    Check the tcp connction of worker node to IDOML server

    This command can be used to check the connection between the worker node and the IDOML server:

    nc -zv <IDOML_SERVER_HOST> <PORT>
    
    Please check the port for postgresql (default: 5432) and redis (default: 6379) services in the IDOML server.

  3. To deploy the IDOML server, ensure your system meets the following requirements:

    • Docker: IDOML utilizes Docker for deployment. Refer to the official Docker documentation for installation instructions.

    • Docker-compose: Docker-compose is required for orchestrating the deployment process. Follow the installation instructions provided in the official Docker-compose documentation.

    Info

    Please be sure to create a group named docker and add the current user to this group. This is necessary to avoid permission issues when running Docker commands.

  4. Clone the IDOML server repository:

    git clone https://github.com/serval-uni-lu/idoml-worker-node.git
    cd idoml-server
    
    git clone git@github.com:serval-uni-lu/idoml-worker-node.git
    cd idoml-server
    
  5. Before deploying IDOML, update the .env file with the necessary configurations:

    • Update IDOML host specific configurations:

      The configurations related to the IDOML server host are stored in the .env.idoml file. The configurations should be the same as the server deployment step.

    • Choose an appropriate queue name:

      The queue name is used to identify the worker node in the Airflow server. After deploying the worker node, the Airflow server will assign tasks to the worker node based on the queue name when it is specified. The queue name can be set in the .env file as follows:

      IDOML_WORKER_NAME=desired_queue_name
      
    • User ID Configuration:

      To ensure proper permissions, the current user ID needs to be passed to the Docker-compose file for Airflow. According to the Airflow official documentation, the user should be in the root group to access the required folders.

      Run the following command to update the .env file:

      echo -e "AIRFLOW_UID=$(id -u)" >> .env
      
    • Docker Group ID Configuration:

      The Docker group ID must be passed to the Docker-compose file for Airflow to enable the Docker operator.

      Run the following command to update the .env file:

      echo -e "DOCKER_GROUP_ID=$(getent group docker | cut -d ':' -f 3)" >> .env
      
  6. Tracking the DAG repository seted up in the IDOML server, it should be the same as the server deployment step:

    • If a public repository is used, please update the .env file with the repository URL and branch name.

      IDOML_GIT_DAG_REPO=https://github.com/{account}/{repo}.git
      IDOML_GIT_DAG_BRANCH=main
      
    • However, if a private repository is used, please use a SSH connection for the repository. For instance:

      IDOML_GIT_DAG_REPO=git@github.com:{account}/{repo}.git
      

      As we are using SSH for the private repository, we need to create an SSH key pair and add the public key to the repository's deploy keys. This deploy key does not require write access to the repository. Additionally, the repository must be added to the known hosts. This can be achieved by following the steps below:

      • Create an SSH key pair:
      ssh-keygen -t ed25519 -f secrets/ssh/idoml_deploy_key
      
      • Add the SSH key to the repository's deploy keys:
      cat secrets/ssh/idoml_deploy_key.pub
      
      • Add the repository to the known hosts:
      ssh-keyscan -t ed25519 github.com >> secrets/ssh/known_hosts
      

      Finally, uncomment the docker-compose.yaml file the following environment variables from the git-sync service:

      docker-compose.yaml
      services:
        git-sync:
          environment:
            # GIT_SYNC_SSH: true
            # GIT_SSH_KEY_FILE: "/etc/git-secret/idoml_deploy_key"
      

Deployment

Once the requirements are met, the IDOML server can be deployed using the magic command at the root of the project folder:

docker compose up -d

Check the worker node status

Once the worker node is deployed, you can check the status of the worker node on the flower dashboard. The flower dashboard is a monitoring tool for Celery. It provides a web-based interface to monitor the worker nodes and tasks.

To access the flower dashboard, open a web browser and navigate to the following URL:

http://flower.{IDOML_DOMAIN}

Where IDOML_DOMAIN is the domain name of the IDOML server defined in the .env file from the server deployment step.