2

I am using JDBC connector to stream data from mysql database to kafka topic. That works and I can see data in kafka topics using avro console consumer. Now I want to read this data to perform few simple filtering operations. I am planning to use Spark or Confluent Consumer. The problem in using spark is that I am not able to read data using Spark JavaInputDStream. I need to read data from kafka and deserialize from avro format to JSON in order to perform some filtering. I am not able to find examples in JAVA which I can refer. Can anyone point out some documentation or source?

Edit: I looked into this: https://spark.apache.org/docs/latest/sql-data-sources-avro.html

I have included Avro maven dependency in my java project:

<dependency>
       <groupId>org.apache.spark</groupId>
       <artifactId>spark-avro_2.12</artifactId>
       <version>2.4.3</version> 
</dependency>

I could not find to_avro and from_avro functions though. I am following this example:

Dataset<Row> output = df
 .select(from_avro (col("value"), jsonFormatSchema).as("user"))
 .where("user.favorite_color == \"red\"")
 .select(to_avro (col("user.name")).as("value"))
Ritesh Sinha
  • 820
  • 5
  • 22
  • 50
  • 1
    Spark cannot natively read the *Confluent Avro* format with `spark-avro`. See various solutions here https://stackoverflow.com/questions/48882723/integrating-spark-structured-streaming-with-the-confluent-schema-registry?rq=1 – OneCricketeer Aug 06 '19 at 22:26

0 Answers0