I have hdfs versions of avro, parquet, and text file. Unfortunately, I can't use spark to convert them to csv. I saw from an earlier so question that this doesn't seem to be possible. How to convert HDFS file to csv or tsv. Is this possible, and if so, how do I do this?
Asked
Active
Viewed 377 times
1
-
please respond with an actual answer. – George Chen Aug 14 '19 at 22:59
-
See https://stackoverflow.com/questions/51215166/convert-parquet-to-csv for parquet to csv. – thebluephantom Aug 14 '19 at 23:51
-
Unfortunately, my codebase doesn't have python. There are no scala questions out there with this topic. – George Chen Aug 15 '19 at 00:06
-
then you have a problem – thebluephantom Aug 15 '19 at 05:36
-
Good for you. I am not sure I understand the issues but SO to the rescue – thebluephantom Aug 16 '19 at 17:34
-
Thanks man. I'm not sure I understand the solution either, but it seems like it will work. – George Chen Aug 16 '19 at 17:39
-
Good question is 5 – thebluephantom Aug 16 '19 at 18:24
-
Thanks for the upvote man! – George Chen Aug 16 '19 at 19:03
1 Answers
0
This will help you to read Avro files (just avoid schema evolution/modifications...). Example.
As to Parquet, you can use parquet-mr, take a look at ParquetReader. Example: ignore the Spark usage, they just use it in order to create a Parquet file to be used later on with ParquetReader.
Hope it helps

Community
- 1
- 1

Nir Hedvat
- 870
- 7
- 7
-
Thank you for the help. Quick question. Does this handle all the edge cases? Ie null values. – George Chen Aug 15 '19 at 18:53
-
I think it does, test it. One crucial thing you should take under consideration is that it does not support scheme evolution. All files should have the same schema when read. – Nir Hedvat Aug 15 '19 at 19:30
-
Could you also clarify, why we need both serialize and deserealize. I thought we would just read the avro using the filereader, and then write to csv no? – George Chen Aug 15 '19 at 23:06
-
You don't need both. They just give an example of read and write. Use whatever you need accordingly – Nir Hedvat Aug 16 '19 at 09:05