There are at least 3 different solutions to this problem.
You can read the CSV files into a IEnumerable<Dto>
and write the parquet file using either Parquet.Net
or ParquetSharp
.
The third solution is to use DuckDB.Net
to craft a SQL statement to read the CSV directly into a Parquet file.
COPY (
SELECT *
FROM read_csv('flights.csv', delim='|', header=True, columns={'FlightDate': 'DATE', 'UniqueCarrier': 'VARCHAR', 'OriginCityName': 'VARCHAR', 'DestCityName': 'VARCHAR'})
) TO 'test.parquet' (FORMAT 'parquet', COMPRESSION 'ZSTD', ROW_GROUP_SIZE 100000)
Using the DuckDb.Net
ADO.NET
connector.
Disclaimer: I am a contributor to the DuckDB.Net
project.