0

I have parquet files in my hdfs. I want to convert these parquet files into csv format & copy to local. I tried this:

hadoop fs -text /user/Current_Data/partitioned_key=MEDIA/000000_0  > /home/oozie-coordinator-workflows/quality_report/media.csv

hadoop fs -copyToLocal /user/Current_Data/partitioned_key=MEDIA/000000_0 /home/oozie-coordinator-workflows/quality_report/media1.csv
King Midas
  • 1,442
  • 4
  • 29
  • 50
user3890017
  • 21
  • 1
  • 3

1 Answers1

0

What you are doing will not work, you are just reading and writing the parquet data not converting.

You can do it with spark or hive/impala, below is the explanation in spark.

SPARK:

Read the parquet files:

df = spark.read.parquet("/user/Current_Data/partitioned_key=MEDIA/")

Write it to HDFS:

df.write.csv("home/oozie-coordinator-workflows/quality_report/media1.csv")

Check out out more on the above here.

HIVE :

CREATE TABLE test ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.avro.AvroSerDe' STORED AS AVRO TBLPROPERTIES ('avro.schema.url'='myHost/myAvroSchema.avsc'); 

CREATE EXTERNAL TABLE parquet_test LIKE test STORED AS PARQUET LOCATION 'hdfs:///user/Current_Data/partitioned_key=MEDIA/';

After you have the table you can create a CSV file through beeline/hive with the below command.

beeline -u 'jdbc:hive2://[databaseaddress]' --outputformat=csv2 -e "select * from parquet_test" > /local/path/toTheFile.csv

Check the below two links for more explanation.

Dynamically create Hive external table with Avro schema on Parquet Data

Export as csv in beeline hive

Catalina Chircu
  • 1,506
  • 2
  • 8
  • 19
roh
  • 1,033
  • 1
  • 11
  • 19