0

Q1. Considering I have a dataframe df and a schema myschema, how do I proceed to write the dataframe into kafka topic in an avro format ?

Q2. Is there any optimized way if we do not consider udf ?

Most of the available solutions are for spark > 2.4 where they have inbuilt avro functions to use.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
supernatural
  • 1,107
  • 11
  • 34
  • 1
    Does this answer your question? [Spark Dataframe write to kafka topic in avro format?](https://stackoverflow.com/questions/47951668/spark-dataframe-write-to-kafka-topic-in-avro-format) – Giorgos Myrianthous May 19 '20 at 10:17
  • I tried, from there, `eventDF.select( encodeUDF(struct(eventDF.columns.map(column):_*)).alias("value") )` `struct` and `column` was showing red in color, could you please help me in defining these select query – supernatural May 19 '20 at 10:29
  • 1) Are you using the Schema Registry? 2) They are read because you never defined/imported them – OneCricketeer May 20 '20 at 19:03
  • This docs page is correct, if you are **not** using Schema Registry https://spark.apache.org/docs/latest/sql-data-sources-avro.html#to_avro-and-from_avro – OneCricketeer May 20 '20 at 19:04

1 Answers1

0

Most of the available solutions are for spark > 2.4 where they have inbuilt avro functions

That inbuilt function was an external library, but was later merged in to the main Spark project. If you have < 2.4, I suggest you upgrade you ultimately upgrade your Spark cluster, or refer the docs there.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245