CSV file to Parquet using C#

Question

I am new to C#, We have requirement to generate parquet files from csv. Our file sizes up to 30gb, so performance is the matter while generating.

I do not get any help/suggestions from google to handle.

Can someone suggest or share solution please (Either console /Script task).

Does this answer your question? [How to convert a csv file to parquet](https://stackoverflow.com/questions/26124417/how-to-convert-a-csv-file-to-parquet) — A-Tech, Nov 15 '22 at 09:56
If you're programming in C# one assumes you know what NuGet packages are. Have you looked for a NuGet package that can read and write Parquet files? — AlwaysLearning, Nov 15 '22 at 10:14
Why is this tagged SQl Server? anyway I typed _C# Parquet Library_ into google and this was the top link https://www.nuget.org/packages/Parquet.Net It's inconcievable to me that you could not find this. — Nick.Mc, Nov 15 '22 at 10:18
Does this helps? https://stackoverflow.com/questions/60929842/how-to-convert-a-csv-file-to-parquet-using-c-sharp/62181950#62181950 — Cinchoo, Jan 28 '23 at 00:27

score 1 · Accepted Answer · answered Nov 15 '22 at 11:13

1

You can use this NuGet Package, which includes automatic serializer/deserializer from C# classes into parquet files that works by generating MSIL (bytecode) on the fly and is therefore super fast.

answered Nov 15 '22 at 11:13

Thomas 94

24
4

score 0 · Answer 2 · answered Apr 01 '23 at 18:54

0

I haven't tried this yet, but do it via CLI tools and just call those from C# aka "shell out".

https://github.com/domoritz/arrow-tools

answered Apr 01 '23 at 18:54

yzorg

4,224
3
39
57

score 0 · Answer 3 · answered Apr 01 '23 at 19:30

There are at least 3 different solutions to this problem.

You can read the CSV files into a IEnumerable<Dto> and write the parquet file using either Parquet.Net or ParquetSharp.

The third solution is to use DuckDB.Net to craft a SQL statement to read the CSV directly into a Parquet file.

COPY (
    SELECT * 
    FROM read_csv('flights.csv', delim='|', header=True, columns={'FlightDate': 'DATE', 'UniqueCarrier': 'VARCHAR', 'OriginCityName': 'VARCHAR', 'DestCityName': 'VARCHAR'})
) TO 'test.parquet' (FORMAT 'parquet', COMPRESSION 'ZSTD', ROW_GROUP_SIZE 100000)

Using the DuckDb.Net ADO.NET connector.

Disclaimer: I am a contributor to the DuckDB.Net project.

CSV file to Parquet using C#

3 Answers3