Questions tagged [spark-streaming-kafka]

Spark Streaming integration for Kafka. Direct Stream approach provides simple parallelism, 1:1 correspondence between Kafka partitions and Spark partitions, and access to offsets and metadata.

250 questions
37
votes
8 answers

How to write spark streaming DF to Kafka topic

I am using Spark Streaming to process data between two Kafka queues but I can not seem to find a good way to write on Kafka from Spark. I have tried this: input.foreachRDD(rdd => rdd.foreachPartition(partition => partition.foreach { case…
13
votes
1 answer

Spark streaming + Kafka vs Just Kafka

Why and when one would choose to use Spark streaming with Kafka? Suppose I have a system getting thousand messages per seconds through Kafka. I need to apply some real time analytics on these messages and store the result in a DB. I have two…
12
votes
6 answers

Kafka Producer - org.apache.kafka.common.serialization.StringSerializer could not be found

I have creating a simple Kafka Producer & Consumer.I am using kafka_2.11-0.9.0.0. Here is my Producer code. public class KafkaProducerTest { public static String topicName = "test-topic-2"; public static void main(String[] args) { // TODO…
10
votes
1 answer

Unable to find LoginModule class: org.apache.kafka.common.security.plain.PlainLoginModule

Environment: Spark 2.3.0, Scala 2.11.12, Kafka (Whatever the latest version is) I have a secure Kafka system, to which I'm trying to connect my Spark Streaming Consumer. Below is my build.sbt file: name := "kafka-streaming" version :=…
8
votes
1 answer

Pyspark Failed to find data source: kafka

I am working on Kafka streaming and trying to integrate it with Apache Spark. However, while running I am getting into issues. I am getting the below error. This is the command I am using. df_TR =…
8
votes
2 answers

Spark Structured Streaming with Kafka doesn't honor startingOffset="earliest"

I've set up Spark Structured Streaming (Spark 2.3.2) to read from Kafka (2.0.0). I'm unable to consume from the beginning of the topic if messages entered the topic before Spark streaming job is started. Is this expected behavior of Spark streaming…
7
votes
2 answers

Increase Kafka Streams Consumer Throughput

I have a Spark Streaming application and a Kafka Streams application running side by side, for benchmarking purposes. Both consume from the same input topic and write to different targets databases. Input topic has 15 partitions, both spark…
7
votes
3 answers

Provider org.apache.spark.sql.avro.AvroFileFormat could not be instantiated

Unable to send avro format message to Kafka topic from spark streaming application. Very less information is available online about avro spark streaming example code. "to_avro" method doesn't require avro schema then how it will encode to avro…
amitwdh
  • 661
  • 2
  • 9
  • 19
6
votes
1 answer

Spark structured streaming exactly once - Not achieved - Duplicated events

I am using Spark Structured Streaming to consume events from Kafka and upload them to S3. Checkpoints are committed on S3: DataFrameWriter writer = input.writeStream() .format("orc") …
6
votes
1 answer

How to define Kafka (data source) dependencies for Spark Streaming?

I'm trying to consume a kafka 0.8 topic using spark-streaming2.0.0, i'm trying to identify the required dependencies i have tried using these dependencies in my build.sbt file libraryDependencies += "org.apache.spark" %% "spark-streaming_2.11" %…
user2359997
  • 561
  • 1
  • 16
  • 40
6
votes
1 answer

Spark Streaming Kafka backpressure

We have a Spark Streaming application, it reads data from a Kafka queue in receiver and does some transformation and output to HDFS. The batch interval is 1min, we have already tuned the backpressure and spark.streaming.receiver.maxRate parameters,…
6
votes
2 answers

Spark Streaming Kafka stream

I'm having some issues while trying to read from kafka with spark streaming. My code is: val sparkConf = new SparkConf().setMaster("local[2]").setAppName("KafkaIngestor") val ssc = new StreamingContext(sparkConf, Seconds(2)) val kafkaParams =…
besil
  • 1,308
  • 1
  • 18
  • 29
5
votes
2 answers

How to perform Unit testing on Spark Structured Streaming?

I would like to know about the unit testing side of Spark Structured Streaming. My scenario is, I am getting data from Kafka and I am consuming it using Spark Structured Streaming and applying some transformations on top of the data. I am not sure…
5
votes
2 answers

Spark 2.4.0 Avro Java - cannot resolve method from_avro

I'm trying to run a spark stream from a kafka queue containing Avro messages. As per https://spark.apache.org/docs/latest/sql-data-sources-avro.html I should be able to use from_avro to convert column value to Dataset. However, I'm unable to…
Maciej C
  • 55
  • 3
  • 6
5
votes
2 answers

Extract the time stamp from kafka messages in spark streaming?

Trying to read from kafka source. I want to extract timestamp from message received to do structured spark streaming. kafka(version 0.10.0.0) spark streaming(version 2.0.1)
shivali
  • 437
  • 1
  • 6
  • 13
1
2 3
16 17