I have a dockerized Spark app for simple streaming. The listener generates random numbers and sends them to Kafka using this code:
producer = KafkaProducer(bootstrap_servers=kafka_brokers, api_version=(0, 10, 1))
while True:
data = //generate a json with a single number
producer.send(topic_name, str.encode(json.dumps(data)))
Then I try to read this data using the consumer as such:
consumer = KafkaConsumer(topic_name, bootstrap_servers=['192.168.99.100:9092'])
for message in consumer:
record = json.loads(message.value)
list.append(record['field'])
When I run the code it never gets past the 'for message in consumer' part. I checked within Kafka and the messages are all there but I cannot access them through Python.
Edit: I am using the bitnami spark containers and this settings for kafka and zookeeper.
I just have two separate files, one for the producer and one for the consumer. I run the producer file which sends to Kafka then I spark-submit the consumer file which should just print a list of the numbers received. For this I simply do spark-submit --master spark://spark:7077 consumer.py