1

I have hdfs versions of avro, parquet, and text file. Unfortunately, I can't use spark to convert them to csv. I saw from an earlier so question that this doesn't seem to be possible. How to convert HDFS file to csv or tsv. Is this possible, and if so, how do I do this?

George Chen
  • 45
  • 1
  • 6

1 Answers1

0

This will help you to read Avro files (just avoid schema evolution/modifications...). Example.

As to Parquet, you can use parquet-mr, take a look at ParquetReader. Example: ignore the Spark usage, they just use it in order to create a Parquet file to be used later on with ParquetReader.

Hope it helps

Community
  • 1
  • 1
Nir Hedvat
  • 870
  • 7
  • 7
  • Thank you for the help. Quick question. Does this handle all the edge cases? Ie null values. – George Chen Aug 15 '19 at 18:53
  • I think it does, test it. One crucial thing you should take under consideration is that it does not support scheme evolution. All files should have the same schema when read. – Nir Hedvat Aug 15 '19 at 19:30
  • Could you also clarify, why we need both serialize and deserealize. I thought we would just read the avro using the filereader, and then write to csv no? – George Chen Aug 15 '19 at 23:06
  • You don't need both. They just give an example of read and write. Use whatever you need accordingly – Nir Hedvat Aug 16 '19 at 09:05