0

my kafka cluster version is 0.10.0.0, and i want to use pyspark stream to read kafka data. but in Spark Streaming + Kafka Integration Guide, http://spark.apache.org/docs/latest/streaming-kafka-0-10-integration.html there is no python code example. so can pyspark use spark-streaming-kafka-0-10 to integrate kafka?

Thank you in advance for your help !

Ruslan Ostafiichuk
  • 4,422
  • 6
  • 30
  • 35
kula
  • 71
  • 1
  • 4

2 Answers2

0

I also use spark streaming with Kafka 0.10.0 cluster. After adding following line to your code, you are good to go.

spark.jars.packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.0.0

And here a sample in python:

# Initialize SparkContext
sc = SparkContext(appName="sampleKafka")

# Initialize spark stream context
batchInterval = 10
ssc = StreamingContext(sc, batchInterval)

# Set kafka topic
topic = {"myTopic": 1}

# Set application groupId
groupId = "myTopic"

# Set zookeeper parameter
zkQuorum = "zookeeperhostname:2181"

# Create Kafka stream 
kafkaStream = KafkaUtils.createStream(ssc, zkQuorum, groupId, topic)

#Do as you wish with your stream
# Start stream
ssc.start()
ssc.awaitTermination()
ozlemg
  • 436
  • 2
  • 10
0

You can use spark-streaming-kafka-0-8 when your brokers are 0.10 and later. spark-streaming-kafka-0-8 supports newer brokers versions while streaming-kafka-0-10 does not support older broker versions. streaming-kafka-0-10 as of now is still experimental and has no Python support.

serengeti12
  • 5,205
  • 4
  • 23
  • 27