0

I am trying to do something similar to this question here but instead of using the polars library, I will like to use the Datafusion library

The idea is to go from a vec of struct like this:

#[derive(Serialize)]
struct Test {
    id:u32,
    amount:u32
}

and save to Parquet files, just like in the question I referenced.

While it was possible using polars, as seen in the accepted answer to achieve this by going from the Struct, serialise to JSON and then build the Dataframe from that, I could not find similar approach using Datafusion.

Any suggestions will be appreciated.

dade
  • 3,340
  • 4
  • 32
  • 53

1 Answers1

1

I think the parquet_derive is designed exactly for the usecase of writing Rust structs to/from Parquet files. DataFusion would be useful if you wanted to process the resulting data, for example filtering or aggregating it with SQL

Here is an example in the docs: https://docs.rs/parquet_derive/30.0.1/parquet_derive/derive.ParquetRecordWriter.html

  • So basically use the correct tool to generate the parquet files…use data fusion to query – dade Jan 15 '23 at 13:20