parquet-dotnet has an example I'm trying to work with that looks like this:
using (Stream fileStream = System.IO.File.OpenRead("c:\\test.parquet"))
{
using (var parquetReader = new ParquetReader(fileStream))
{
DataField[] dataFields = parquetReader.Schema.GetDataFields();
for(int i = 0; i < parquetReader.RowGroupCount; i++)
{
using (ParquetRowGroupReader groupReader = parquetReader.OpenRowGroupReader(i))
{
DataColumn[] columns = dataFields.Select(groupReader.ReadColumn).ToArray();
}
}
}
}
The concern I have is with the columns
line. If I have data that looks like this, from a table perspective:
ID | Name |
---|---|
1 | Test1 |
1 | Test2 |
I want to map this data from the parquet file to a model that looks exactly like that. The issue that I have is that the data comes out from columns
looking like this:
columns[0].Data[0] - 1
columns[0].Data[1] - 1
columns[1].Data[0] - Test1
columns[1].Data[1] - Test2
This might be a little hard to understand, but essentially, the columns
variable is a collection of properties that has an array of values. That array is every value in the table for that column. So I'm having a hard time trying to figure out how to match the data in each array position with the data in the same array position in a different column and still keep everything together.
Also, I'm unable to do the normal deserialize because I have properties in the parquet file that look weird like __$something
, so I can't map those to a similarly named property. Any ideas?