I'm trying to read a parquet file bundled as a resource inside a JAR, ideally as a stream.
Does anyone have a working example that doesn't involve writing the resource out as a temporary file first?
Here is the code I'm using to read the files which works fine in the IDE before bundling as a JAR:
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.parquet.avro.AvroParquetReader;
try {
Path path = new Path(classLoader.getResource(pattern_id).toURI());
Configuration conf = new Configuration();
try (ParquetReader<GenericRecord> r = AvroParquetReader.<GenericRecord>builder(
HadoopInputFile.fromPath(path, conf))
.disableCompatibility()
.build()) {
patternsFound.add(pattern_id);
GenericRecord record;
while ((record = r.read()) != null) {
// Do some work
}
} catch (IOException e) {
e.printStackTrace();
}
} catch (NullPointerException | URISyntaxException e) {
e.printStackTrace();
}
When running this code from a JAR file, I get this error:
org.apache.hadoop.fs.UnsupportedFileSystemException: No FileSystem for scheme "jar"
Which I figured I could get around by using:
InputStream inputFile = classLoader.getResourceAsStream(pattern_id);
But don't know how to get AvroParquetReader to work with Input Streams.