2

Is it possible to send a Java object as the value in a Kafka topic and how do I consume it in spark?

I'm currently doing the apache-spark tutorial and was wondering if it is possible to send something else than a String. The tutorial has this example

producer.send(new ProducerRecord<String, String>(topic, something_string));

Is it possible to do something like that?

Car car = new Car(brand, year, color);
producer.send(new ProducerRecord<String, Car>(topic, car));

And how do I consume it later in Spark?

At the moment im doing this:

String car = brand + "," + year + "," + color;
producer.send(new ProducerRecord<String, String>(topic, car));

Where I put everything in a String with comma-separation.

Question 2: At the moment I consume it in this way.

Dataset<String> words = df
.selectExpr("CAST (value AS STRING)")
.as(Encoders.STRING());

where I get the String: "brand,year,color"

how do I split it up and put it in separate columns?

Dushyant Tankariya
  • 1,432
  • 3
  • 11
  • 17
kxell2001
  • 131
  • 1
  • 1
  • 9
  • Ideally, you shouldn't be using comma separated values. JSON would be better if you plan on extracting data within Spark – OneCricketeer Jun 26 '19 at 03:59

1 Answers1

0

Your post actually has two questions, you could split them into separate posts. For the 1st question, refer this post; the central concept is that you have to write a custom serializer.

For the second, the concept is still the same in principal, however you have to write a custom deserializer (decoder) this time on Spark side. Refer this Spark documentation, it demonstrates how a stream needs to be created from Kafka. However, please not the 'KafkaUtil' class, refer javadoc. It has methods to create streams with Kafka decoder classes.

Ironluca
  • 3,402
  • 4
  • 25
  • 32