I have a large csv file approximately 9 GB in size. In Powershell, I need to split it into 10 MB chunks. The catch is that I need to keep the lines intact, so each split file ends at the end of a line, and starts at the start of the next line.
Since the file is so large, I need a means of splitting it which doesn't force the code to upload the entire document all at once (or it will crash).
I've been at this for two weeks, and haven't found a workable solution. The worst part is it takes over a day for the code to reach it's memory breaking point, so I can only test, debug and restart the code a few times per week. Very slow going.
Without fail, each code I've tried has triggered a memory-related error, such as "Get-Content : Insufficient memory to continue the execution of the program."
I've tried the following methods.
If you have a Powershell-related link to a working version of these approaches, or you know a better approach yourself, please weigh in.
I'm going to refrain from giving you complete copies of all three codes, 'cause that would get really long, but at least this way you'll know what methods I've already tried and the insufficient memory errors that keep popping up.
Any help would be appreciated. If I keep pulling out my hair on this issue, I'll be bald before my time!
Previous Attempts (Each one ran for over a day before failing due to a memory error)
- Get-Content (I was under the impression that Get-Content didn't try to load all the data at once, and was therefore the best choice for this.)
Get-Content $sourceAddress |
Foreach-Object{
$var1, var2, var3 ... = $_ -Split ","
# Push these variables into a new line in a csv file
# If you've reached 100,000 lines, export a split file
}
- Import-Csv (Here, I tried extracting data between a range of lines)
for ($i; $i -le $MaxSplits; $i++)
{
(Import-Csv $InCsv)[$Min..$Max] |
Export-Csv -Path $outfile -NoTypeInformation
}
- Stream Reader (ChatGPT says this is the best way. Yes, I'm desperate enough to ask an AI for coding advice)
$reader = New-Object System.IO.StreamReader($filePath)
while (!$reader.EndOfStream) {
# Loop to retrieve the specified number of lines
for ($i = 0; $i -lt $linesPerFile; $i++) {
# Read the next line from the file
# Check if the end of the file has been reached
# If not, Add the line to the lines array
}
# Write the current batch of lines to the output file
# Increment the file counter
}
Again, any help would be appreciated.