0

I'm attempting to set up a quick POC on my Mac laptop (using Docker) to help demonstrate a streaming data ingestion flow using MySQL, Debezium, Kafka and Spark.

The MySQL / Debezium / Kafka environment is set up as follows:

version: '2'
services:
  zookeeper:
    image: quay.io/debezium/zookeeper:${DEBEZIUM_VERSION}
    ports:
     - 2181:2181
     - 2888:2888
     - 3888:3888
  kafka:
    image: quay.io/debezium/kafka:${DEBEZIUM_VERSION}
    ports:
     - 9092:9092
    links:
     - zookeeper
    environment:
     - ZOOKEEPER_CONNECT=zookeeper:2181
  mysql:
    image: quay.io/debezium/example-mysql:${DEBEZIUM_VERSION}
    ports:
     - 3306:3306
    environment:
     - MYSQL_ROOT_PASSWORD=debezium
     - MYSQL_USER=mysqluser
     - MYSQL_PASSWORD=mysqlpw
  connect:
    image: quay.io/debezium/connect:${DEBEZIUM_VERSION}
    ports:
     - 8083:8083
    links:
     - kafka
     - mysql
    environment:
     - BOOTSTRAP_SERVERS=kafka:9092
     - GROUP_ID=1
     - CONFIG_STORAGE_TOPIC=my_connect_configs
     - OFFSET_STORAGE_TOPIC=my_connect_offsets
     - STATUS_STORAGE_TOPIC=my_connect_statuses

This part is up and running. I'm able to connect to MySQL, change some values and see those changes flow through Debezium and Kafka.

I've also set up a stand-alone Spark 3.3 instance using a separate docker-compose as follows:

version: '3'
services:
  spark-master:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '8080:8080'
  spark-worker:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=2G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no

This part is also up and running and I am able to log onto the Spark UI.

My question is: What are the specific configuration or environment change do I need to make in order to be able to submit a Spark streaming job, which will read its data from the Kafka topic, given that both Spark and Kafka are running within their own Docker environments?

Eugene Goldberg
  • 14,286
  • 20
  • 94
  • 167
  • Why not simply put all containers in one Docker network? `docker network create`, then add networks to each image? https://docs.docker.com/compose/networking/ Or just use one compose file? – OneCricketeer Jan 22 '23 at 14:42

1 Answers1

1

I guess you can add host.docker.internal as an advertised address to Kafka and expose its port:

version: '2'
services:
  zookeeper:
    image: quay.io/debezium/zookeeper:${DEBEZIUM_VERSION}
    ports:
     - 2181:2181
     - 2888:2888
     - 3888:3888
  kafka:
    image: quay.io/debezium/kafka:${DEBEZIUM_VERSION}
    ports:
     - 9092:9092
    links:
     - zookeeper
    environment:
     - ZOOKEEPER_CONNECT=zookeeper:2181
     - ADVERTISED_HOST_NAME=host.docker.internal
    extra_hosts:                                                                
      - "host.docker.internal:host-gateway"

then add the host's IP to the Spark docker-compose file, and use it as the Kafka address.

version: '3'
services:
  spark-master:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=master
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    ports:
      - '8080:8080'
    extra_hosts:                                                                
      - "host.docker.internal:host-gateway"
  spark-worker:
    image: docker.io/bitnami/spark:3.3
    environment:
      - SPARK_MODE=worker
      - SPARK_MASTER_URL=spark://spark-master:7077
      - SPARK_WORKER_MEMORY=2G
      - SPARK_WORKER_CORES=1
      - SPARK_RPC_AUTHENTICATION_ENABLED=no
      - SPARK_RPC_ENCRYPTION_ENABLED=no
      - SPARK_LOCAL_STORAGE_ENCRYPTION_ENABLED=no
      - SPARK_SSL_ENABLED=no
    extra_hosts:                                                                
      - "host.docker.internal:host-gateway"
meysam
  • 1,754
  • 2
  • 20
  • 30
  • While it may work, it's less optimal than a proper Docker bridge network as it has to traverse two/three network interfaces – OneCricketeer Jan 22 '23 at 14:43