0

There are some good responses to the question of how to use Powershell to merge multiple CSV files into one, dropping the header row on all but the first file in this thread.
Kemiller2002's posted answer work well for me in most cases, however when the output file exceeds 2GB I start getting out of memory exception errors. The following error message gets thrown...

Exception of type 'System.OutOfMemoryException' was thrown.
At xxx.ps1:9 char:20
+            $false {$lines | Select -Skip 1}
+                    ~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : OperationStopped: (:) [], OutOfMemoryException
    + FullyQualifiedErrorId : System.OutOfMemoryException  

This is using Powershell 5.1. It doesn't seem to be an issue with the MaxMemoryPerShellMB (that reports to be 2147483647), nor does it appear to be an issue with actual system memory - the last time I ran this there were 33GB of free memory (out of 64GB total) left.

Now the script keeps running, and adding to the file (one of my final files winds up being around 7GB in size), but I can't be certain that it has captured every line in all the files when I see this error popping up.

Any suggestions?

EDIT

I added some output so I could see where the errors are occurring. I'm appending 11 files, varying in size from around 350 MB to 1GB... it is the two files which are around 1GB that cause the errors. One has a reported length of 909,050,983 and the other is 973,429,260.

EHutton
  • 1
  • 1

3 Answers3

2

It's simply nuts the way that some people do this using this approach...

Get-Content $SrcFile1, $SrcFile2 | Set-Content $DstFile

Dont do it that way! It's ridiculously slow and invariably leads to memory exception errors. Instead use the good old file copy from the command processor e.g...

cmd /c "copy $($SrcFile1) + $($SrcFile2) $($DstFile)"

Bob
  • 21
  • 2
1

I don't have a large file to test this with, but using .net methods might be an alternative as you can just work on 1 row at a time rather than loading the whole thing in to memory.

$filepath = "c:\temp"
$outputfile = "c:\temp\output\result.csv"
$encoding = [System.Text.Encoding]::UTF8

$files = Get-ChildItem -Path $filePath -Filter *.csv

$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

$skiprow = $false
foreach ($file in $files)
{
    $r = New-Object System.IO.StreamReader($file.fullname, $encoding)
    while (($line = $r.ReadLine()) -ne $null) 
    {
        if (!$skiprow)
        {
            $w.WriteLine($line)
        }
        $skiprow = $false
    }
    $r.Close()
    $r.Dispose()
    $skiprow = $true
}

$w.close()
$w.Dispose()
David Martin
  • 11,764
  • 1
  • 61
  • 74
  • I'll give that a shot tomorrow. – EHutton Jan 02 '18 at 22:29
  • ... but I think that would be very slow. Might be more efficient to simply manually edit out the headers with UltraEdit, then copy the files together, and add back in a header row. DOH! Of course I could script those manual components too - copy the header from the first file into a temp file, strip the headers off all the files, copy them together into a temp body file, then merge the temp header file and the temp body file. – EHutton Jan 02 '18 at 22:41
  • whoa... actually just started up your script with the intention of just letting it run overnight, and it is proving to be far more efficient than the other script I was using. the other script took around 40 minutes to process the 11 files, and here it has only been 10 minutes, and your script is 75% done. Chewed through the two big files without hesitation. – EHutton Jan 02 '18 at 22:55
  • Sorry, I would up vote your answer to mark it as the answer but I don't have enough reputation. – EHutton Jan 02 '18 at 22:56
  • How would I set the encoding options in there? Accents in my source file are all getting messed up. I know there is a way to set the encoding for StreamWriter, but I'm uncertain how to do it in the way you are invoking it. – EHutton Jan 03 '18 at 14:10
  • To use different constructors you can use New-Object as follows: $w = New-Object System.IO.StreamWriter($outputfile, $true, [System.Text.Encoding]::UTF8) $r = New-Object System.IO.StreamReader($_.fullname, [System.Text.Encoding]::UTF8) see here for the different encoding options: https://msdn.microsoft.com/en-us/library/system.text.encoding%28v=vs.110%29.aspx – David Martin Jan 03 '18 at 14:33
  • Awesome, thanks so much David. I had found the different constructors in the MSDN documentation, but I wasn't certain how to convert how you had built the constructor into the documented fashion.... I wasn't sure if changing the way you were passing the $outputfile parameter to the writer would mess things up or not. – EHutton Jan 03 '18 at 19:00
  • Yup, finally got that to work after much hair pulling trying to get the right codepage used for the original text file – EHutton Jan 03 '18 at 20:33
0

A complete answer evolved from the great point Bob made

<###########################
user config section
###########################>

# location of files to concatenate
$sourcefolder = "P:\DWH\ntm_v1\uncompressed"

# source file extension
$ext = "*.csv"

# output folder (best to have new folder for safety)
$outfolder = $sourcefolder + "\..\concatenated"

#output file name
$outfilename = "concat.txt"

<###########################
do work
###########################>

# build full path to out file
$concatfile = $outfolder + "\" + $outfilename

#create output folder
md -Force $outfolder

# delete output file if exists
if (Test-Path $concatfile) 
{
  Remove-Item -Confirm $concatfile
}

ForEach ($file in (Get-ChildItem -Path $sourcefolder -Filter $ext)) {
    $param = "type $file >> $concatfile"
    Write-Host "cmd /c $param"
        
    # run concat command
    cmd /c $param;  
}
golfalot
  • 956
  • 12
  • 22