Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.
Questions tagged [spark-streaming-kafka]
250 questions
37
votes
8 answers
How to write spark streaming DF to Kafka topic
I am using Spark Streaming to process data between two Kafka queues but I can not seem to find a good way to write on Kafka from Spark. I have tried this:
input.foreachRDD(rdd =>
rdd.foreachPartition(partition =>
partition.foreach {
case…

Chobeat
- 3,445
- 6
- 41
- 59
13
votes
1 answer
Spark streaming + Kafka vs Just Kafka
Why and when one would choose to use Spark streaming with Kafka?
Suppose I have a system getting thousand messages per seconds through Kafka. I need to apply some real time analytics on these messages and store the result in a DB.
I have two…

Sash
- 271
- 2
- 14
12
votes
6 answers
Kafka Producer - org.apache.kafka.common.serialization.StringSerializer could not be found
I have creating a simple Kafka Producer & Consumer.I am using kafka_2.11-0.9.0.0. Here is my Producer code.
public class KafkaProducerTest {
public static String topicName = "test-topic-2";
public static void main(String[] args) {
// TODO…

Sanjeev
- 208
- 1
- 2
- 10
10
votes
1 answer
Unable to find LoginModule class: org.apache.kafka.common.security.plain.PlainLoginModule
Environment: Spark 2.3.0, Scala 2.11.12, Kafka (Whatever the latest version is)
I have a secure Kafka system, to which I'm trying to connect my Spark Streaming Consumer. Below is my build.sbt file:
name := "kafka-streaming"
version :=…

Sparker0i
- 1,787
- 4
- 35
- 60
8
votes
1 answer
Pyspark Failed to find data source: kafka
I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error.
This is the command I am using.
df_TR =…

P Kernel
- 217
- 4
- 13
8
votes
2 answers
Spark Structured Streaming with Kafka doesn't honor startingOffset="earliest"
I've set up Spark Structured Streaming (Spark 2.3.2) to read from Kafka (2.0.0). I'm unable to consume from the beginning of the topic if messages entered the topic before Spark streaming job is started. Is this expected behavior of Spark streaming…

Daniel Ahn
- 141
- 1
- 8
7
votes
2 answers
Increase Kafka Streams Consumer Throughput
I have a Spark Streaming application and a Kafka Streams application running side by side, for benchmarking purposes. Both consume from the same input topic and write to different targets databases. Input topic has 15 partitions, both spark…

Guilherme Alcântara
- 135
- 1
- 8
7
votes
3 answers
Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated
Unable to send avro format message to Kafka topic from spark streaming application. Very less information is available online about avro spark streaming example code. "to_avro" method doesn't require avro schema then how it will encode to avro…

amitwdh
- 661
- 2
- 9
- 19
6
votes
1 answer
Spark structured streaming exactly once - Not achieved - Duplicated events
I am using Spark Structured Streaming to consume events from Kafka and upload them to S3.
Checkpoints are committed on S3:
DataFrameWriter writer = input.writeStream()
.format("orc")
…

Alex Stanovsky
- 1,286
- 1
- 13
- 28
6
votes
1 answer
How to define Kafka (data source) dependencies for Spark Streaming?
I'm trying to consume a kafka 0.8 topic using spark-streaming2.0.0, i'm trying to identify the required dependencies i have tried using these dependencies in my build.sbt file
libraryDependencies += "org.apache.spark" %% "spark-streaming_2.11" %…

user2359997
- 561
- 1
- 16
- 40
6
votes
1 answer
Spark Streaming Kafka backpressure
We have a Spark Streaming application, it reads data from a Kafka queue in receiver and does some transformation and output to HDFS. The batch interval is 1min, we have already tuned the backpressure and spark.streaming.receiver.maxRate parameters,…

YichaoCai
- 71
- 1
- 4
6
votes
2 answers
Spark Streaming Kafka stream
I'm having some issues while trying to read from kafka with spark streaming.
My code is:
val sparkConf = new SparkConf().setMaster("local[2]").setAppName("KafkaIngestor")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val kafkaParams =…

besil
- 1,308
- 1
- 18
- 29
5
votes
2 answers
How to perform Unit testing on Spark Structured Streaming?
I would like to know about the unit testing side of Spark Structured Streaming. My scenario is, I am getting data from Kafka and I am consuming it using Spark Structured Streaming and applying some transformations on top of the data.
I am not sure…

BigD
- 850
- 2
- 17
- 40
5
votes
2 answers
Spark 2.4.0 Avro Java - cannot resolve method from_avro
I'm trying to run a spark stream from a kafka queue containing Avro messages.
As per https://spark.apache.org/docs/latest/sql-data-sources-avro.html I should be able to use from_avro to convert column value to Dataset.
However, I'm unable to…

Maciej C
- 55
- 3
- 6
5
votes
2 answers
Extract the time stamp from kafka messages in spark streaming?
Trying to read from kafka source. I want to extract timestamp from message received to do structured spark streaming.
kafka(version 0.10.0.0)
spark streaming(version 2.0.1)

shivali
- 437
- 1
- 6
- 13