2

I want to export data from database and convert in to Avro + Parquet format. Sqoop support Avro export but not Parquet. I try to convert the Avro object to Parquet using Apache Pig, Apache Crunch etc but nothing working out.

Apache pig gives me "Caused by: org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist". But the input path exist on that location.

Apache Crunch always throw :java.lang.ClassNotFoundException: Class org.apache.crunch.impl.mr.run.CrunchMapper not found" despite I added it in to the Hadoop lib path.

What is the best and easy way to export data from DB in to Parquet format?

Ananth Duari
  • 2,859
  • 11
  • 35
  • 42

3 Answers3

3

I use Hive.

Create an external table on the Avro data. Create an empty Parquet table.

And then insert overwrite table PARQUET_TABLE select * from AVRO_TABLE.

Super easy :)

Gwen Shapira
  • 4,978
  • 1
  • 24
  • 23
2

The most recent sqoop (1.4.6 I think) supports import to files containing data in Parquet format and also import to Parquet with associated Hive table creation.

Ted Dunning
  • 1,877
  • 15
  • 12
0

I was able to dump a mysql table using sqoop1 into an avro file and then convert the avro file into a parquet file using avro2parquet https://github.com/tispratik/avro2parquet conversion tool. Once it was in parquet, i could upload it to hdfs and create a hive table on top of it. You need a parquet plugin in hive if running hive version prior to 0.13. Hive supports parquet natively in 0.13.

Pratik Khadloya
  • 12,509
  • 11
  • 81
  • 106