I'm trying to read a local Parquet file, however the only APIs I can find are tightly coupled with Hadoop, and require a Hadoop Path
as input (even for pointing to a local file).
ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(file).build();
GenericRecord nextRecord = reader.read();
is the most popular answer in how to read a parquet file, in a standalone java code?, but requires a Hadoop Path
and has now been deprecated for a mysterious InputFile
instead. The only implementation of InputFile
I can find is HadoopInputFile
, so again no help.
In Avro this is a simple:
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
this.dataFileReader = new DataFileReader<>(file, datumReader);
(where file is java.io.File
). What's the Parquet equivalent?
I am asking for no Hadoop Path
dependency in the answers, because Hadoop drags in bloat and jar hell, and it seems silly to require it for reading local files.
To further explain the backstory, I maintain a small IntelliJ plugin that allows users to drag-and-drop Avro files into a pane for viewing in a table. This plugin is currently 5MB. If I include Parquet and Hadoop dependencies, it bloats to over 50MB, and doesn't even work.
POST-ANSWER ADDENDUM
Now that I have it working (thanks to the accepted answer), here is my working solution that avoids all the annoying errors that can be dragged in by depending heavily on the Hadoop Path
API: