0

Currently, I'm doing a Data engineering project in which I have an EC2 instance hosting the airflow and another ec2 machine which is both a spark master and a spark worker. The problem is that the process of submitting tasks to spark is successful if it occurs in the hosting machine of the client node. But if it occurs in the docker container (I want to dockerize the airflow), then it will be failed. The evidence is that: If I open a spark-shell in the host machine of the client node, it works perfectly. However, if I do it in the docker container, it encounters the error:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

The docker-compose file I will left below:

version: '3.3'
x-airflow-common:
  &airflow-common
  build:
    context: ./Docker_Container_Airflow
  environment:
    &airflow-common-env
    AIRFLOW__CORE__EXECUTOR: LocalExecutor
    AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres/airflow
    AIRFLOW_CONN_SPARK_CON: $Spark_Con
    AIRFLOW_CONN_MYSQL_CON_CITY: $MySQL_Con_City
    AIRFLOW_CONN_MYSQL_CON_COUNTRY: $MySQL_Con_Country
    AIRFLOW_CONN_MYSQL_CON_GLOBAL: $MySQL_Con_Global
    AIRFLOW_CONN_S3_CON: $S3_Con
  volumes:
    - ./dags:/opt/airflow/dags
    - ./logs:/opt/airflow/logs
  user: "${AIRFLOW_UID:-50000}:${AIRFLOW_GID:-50000}"
  depends_on:
    postgres:
      condition: service_healthy

# The posgres database is used to store the metadata during the operation of the airflow webserver
services:
  postgres:
    container_name: postgres-airflow
    image: postgres:13
    healthcheck:
      test: [ "CMD", "pg_isready", "-U", "airflow" ]
      interval: 5s
      retries: 5
    restart: always
    ports:
      - "5432:5432"
    environment:
      - POSTGRES_USER=airflow
      - POSTGRES_PASSWORD=airflow
      - POSTGRES_DB=airflow

  scheduler:
    <<: *airflow-common
    container_name: airflow_scheduler
    command: scheduler
    healthcheck:
      test:
        [
          "CMD-SHELL",
          'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"'
        ]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

  webserver:
    <<: *airflow-common
    container_name: airflow_webserver
    command: webserver
    ports:
      - 8080:8080
    environment:
      <<: *airflow-common-env
      _AIRFLOW_DB_UPGRADE: 'true'
      _AIRFLOW_WWW_USER_CREATE: 'true'
      _AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
      _AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
    healthcheck:
      test:
        [
          "CMD",
          "curl",
          "--fail",
          "http://localhost:8080/health"
        ]
      interval: 10s
      timeout: 10s
      retries: 5
    restart: always

I've also tried specify those services to the host network but the docker containers (airflow_webserver and airflow_scheduler) end up being unhealthy.

I've searched google and other platforms but I don't know what exactly causes this. I want to know what is the reason and how to resolve this.

Quok
  • 1
  • 2
  • Have a look at [this](https://stackoverflow.com/questions/45489248/running-spark-driver-program-in-docker-container-no-connection-back-from-execu) – o_O Feb 28 '23 at 04:48

0 Answers0