5

What is the simple way to write Parquet Format to HDFS (using Java API) by directly creating Parquet Schema of a Pojo, without using avro and MR?

The samples I found were outdated and uses deprecated methods also uses one of Avro, spark or MR.

Devas
  • 1,544
  • 4
  • 23
  • 28

1 Answers1

6

Effectively, there is not a lot of sample available for reading/writing Apache parquet files without the help of an external framework.

The core parquet library is parquet-column where you can find some test files reading/writing directly : https://github.com/apache/parquet-mr/blob/master/parquet-column/src/test/java/org/apache/parquet/io/TestColumnIO.java

You then just need to use the same functionality with an HDFS file. You can follow this SOW question for this : Accessing files in HDFS using Java

UPDATED : to respond to the deprecated parts of the API : AvroWriteSupport should be replaced by AvroParquetWriter and I check ParquetWriter it's not deprecated and can be used safely.

Regards,

Loïc

Community
  • 1
  • 1
loicmathieu
  • 5,181
  • 26
  • 31
  • Thanks for your help. I think the provided link is a bit complex to understand, may be because I'm new to it. So I went with avro schema, But again there are classes which are deprecated, 'AvroWriteSupport' and 'ParquetWriter'. What will be the alternatives for the above classes. sample code got from [here](http://blog.cloudera.com/blog/2014/05/how-to-convert-existing-data-into-parquet/) – Devas Aug 29 '16 at 13:26
  • 1
    You can use builder to build writer object instead of using constructor. – Deepak Dec 08 '16 at 12:48