Spark, read from Kafka stream failing AnalysisException

Question

I am using Spark 2.4.5, Kafka 2.3.1 on my local machine.

I am able to produce and consume messages on Kafka with bootstrap server config "localhost:9092”

While trying to setup reader with spark streaming API, I am getting an error as

Exception Message: Py4JJavaError: An error occurred while calling o166.load. : org.apache.spark.sql.AnalysisException: Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide".;

Spark Code I am trying to execute:

df1 = spark.readStream.format("kafka")\
 .option("kafka.bootstrap.servers", "localhost:9092")\
 .option("subscribe", "topic1")\
 .load()

How to check if Spark has data source "Kafka"? If not then how to add it?

score 0 · Answer 1 · answered Apr 05 '20 at 16:07

0

You need to start your spark-shell or spark-submit with --packages org.apache.spark:spark-sql-kafka-0-10_2.11:2.4.5 option to pull the corresponding packages into classpath. See documentation that it mention in the exception.

answered Apr 05 '20 at 16:07

Alex Ott

80,552
8
87
132

Spark, read from Kafka stream failing AnalysisException

1 Answers1