I have a large CSV file of 7GB and inside CSV, there are fields that have line breaks inside text. I am able to split the large CSV using C# using below. But as the string contains line breaks, it starts a new line from there. I cannot replace the linebreak using readline() as it throws out of memory exception as file is huge.
using (StreamReader reader = new StreamReader(inputFilePath))
{
int fileCount = 0;
string line;
while ((line = reader.ReadLine()) != null)
{
if (fileCount % batchSize == 0)
{
string outputFilePath = Path.Combine(outputDirectory, $"output_{fileCount / batchSize}.csv");
using (StreamWriter writer = new StreamWriter(outputFilePath))
{
writer.WriteLine(line);
}
}
else
{
string outputFilePath = Path.Combine(outputDirectory, $"output_{fileCount / batchSize}.csv");
using (StreamWriter writer = new StreamWriter(outputFilePath, true))
{
writer.WriteLine(line);
}
}
fileCount++;
}
}
Above code successfully read large file and splits it. Just that it also takes linebreaks inside the a column and splits it to another line. And as I mentioned above, I cannot replace the line using Readline().Replace as it will throw out of memory exception.
Please advise how to perform both operation at same time.
If not c#, PowerShell would also work.
I used below in PowerShell, but that also had same issue:
Import-Csv -Path "C:\largefile.csv" | Group-Object -Property { [math]::Floor($_.PSObject.Properties.Count / 10000) } | ForEach-Object { $_.Group | Export-Csv -Path "C:\smallfile$($_.Name).csv" -NoTypeInformation }