I tried to publish records from a dataframe built from an avro file while it is built from a CSV file using dataframe. I published the data into a kafka topic in avro format using to_avro(struct(*))
from the dataframe, I was able to view the binary data in the kafka topic.
When I am deserializing using this code:
jsonFormatSchema = open("examples/src/main/resources/user.avsc", "r").read()`
df = spark
.readStream
.format("kafka")
.option("kafka.bootstrap.servers", "host1:port1,host2:port2")
.option("subscribe", "topic1")
.load()
Decode the Avro data into a struct
Encode the column name in Avro format.
output = df.select(from_avro("value", jsonFormatSchema).alias("user"))
When we fetch the data in the output using show()
or display, it is throwing an issue like "spark exception unable to parse, use mode permissive". We already used that while fetching CSV data and applied the same mode while reading kafka. Can anyone help me to fetch this issue and parse the avro data?