Read a fastparquet file using Akka parquet

Asked Jun 05 '19 at 15:10

Active Jun 05 '19 at 15:10

Viewed 267 times

I have one of our Python systems generating Parquet files using Pandas and fastparquet. These are to be read by a Scala system that runs atop Akka streams.

Akka does provide a source for reading Avro Parquet files. However, when I try to read the file, I end up with

java.lang.IllegalArgumentException: INT96 not yet implemented.

This is one of the columns that does not need to be read for the Scala application to work. My question is whether I can specify a schema and get just that one column out considering that the generated file is from fastparquet.

The relevant snippet which generates a source for reading Parquet files is:

.map(result => {
      val path = s"s3a://${result.bucketName}/${result.key}"
      val file = HadoopInputFile.fromPath(new Path(path), hadoopConfig)
      val reader: ParquetReader[GenericRecord] =
        AvroParquetReader
          .builder[GenericRecord](file)
          .withConf(hadoopConfig)
          .build()
      AvroParquetSource(reader)
    })

asked Jun 05 '19 at 15:10

An SO User

24,612
35
133
221

Read a fastparquet file using Akka parquet

0 Answers0