Converting Parquet to CSV

Question

I have parquet files in my hdfs. I want to convert these parquet files into csv format & copy to local. I tried this:

hadoop fs -text /user/Current_Data/partitioned_key=MEDIA/000000_0  > /home/oozie-coordinator-workflows/quality_report/media.csv

hadoop fs -copyToLocal /user/Current_Data/partitioned_key=MEDIA/000000_0 /home/oozie-coordinator-workflows/quality_report/media1.csv

score 0 · Answer 1 · edited Apr 05 '21 at 15:23

What you are doing will not work, you are just reading and writing the parquet data not converting.

You can do it with spark or hive/impala, below is the explanation in spark.

SPARK:

Read the parquet files:

df = spark.read.parquet("/user/Current_Data/partitioned_key=MEDIA/")

Write it to HDFS:

df.write.csv("home/oozie-coordinator-workflows/quality_report/media1.csv")

Check out out more on the above here.

HIVE :

CREATE TABLE test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 

CREATE EXTERNAL TABLE parquet_test LIKE test STORED AS PARQUET LOCATION 'hdfs:///user/Current_Data/partitioned_key=MEDIA/';

After you have the table you can create a CSV file through beeline/hive with the below command.

beeline -u 'jdbc:hive2://[databaseaddress]' --outputformat=csv2 -e "select * from parquet_test" > /local/path/toTheFile.csv

Check the below two links for more explanation.

Dynamically create Hive external table with Avro schema on Parquet Data

Export as csv in beeline hive

Your first link is no longer available. – Catalina Chircu Apr 04 '21 at 14:52 — Catalina Chircu, Apr 04 '21 at 14:52

Converting Parquet to CSV

1 Answers1