33

What's the best way of concatenating binary files using Powershell? I'd prefer a one-liner that simple to remember and fast to execute.

The best I've come up with is:

gc -Encoding Byte -Path ".\File1.bin",".\File2.bin" | sc -Encoding Byte new.bin

This seems to work ok, but is terribly slow with large files.

FkYkko
  • 1,015
  • 1
  • 12
  • 18

4 Answers4

37

The approach you're taking is the way I would do it in PowerShell. However you should use the -ReadCount parameter to improve perf. You can also take advantage of positional parameters to shorten this even further:

gc File1.bin,File2.bin -Encoding Byte -Read 512 | sc new.bin -Encoding Byte

Editor's note: In the cross-platform PowerShell (Core) edition (version 6 and up), -AsByteStream must now be used instead of -Encoding Byte; also, the sc alias for the Set-Content cmdlet has been removed.

Regarding the use of the -ReadCount parameter, I did a blog post on this a while ago that folks might find useful - Optimizing Performance of Get Content for Large Files.

mklement0
  • 382,024
  • 64
  • 607
  • 775
Keith Hill
  • 194,368
  • 42
  • 353
  • 369
  • 4
    I just ran this on my example files and the command went from taking 9 minutes to 3 seconds with the inclusion of the -read param. This is on a x25m drive. Nice. You get my accept. – FkYkko Nov 23 '09 at 15:36
  • Just used your one-liner to join a 4.4gb iso spanned over 23 files. Reassembled the file fine, and took 35 minutes on my laptop using 1024 byte blocks. –  Jul 12 '12 at 21:56
  • I'm guessing this works because the pipe is sending .net objects to sc? When I tried to pipe binary data to a c program, I noticed that I only got the first 7 bits of each byte, since "|" was invokes encoding. – johnnycrash Jul 14 '14 at 21:52
  • No longer works in PowerShell 6/7. Byte is not an accepted encoding. `Get-Content: Cannot process argument transformation on parameter 'Encoding'. 'Byte' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name')` – Daniel Lidström Mar 18 '20 at 10:14
32

It's not Powershell, but if you have Powershell you also have the command prompt:

copy /b 1.bin+2.bin 3.bin

As Keith Hill pointed out, if you really need to run it from inside Powershell, you can use:

cmd /c copy /b 1.bin+2.bin 3.bin 
João Angelo
  • 56,552
  • 12
  • 145
  • 147
  • 8
    copy is an intrinsic command in cmd.exe. You would have to execute cmd /c copy /b 1.bin+2.bin 3.bin – Keith Hill Nov 23 '09 at 15:13
  • Nice simple solution, works on any windows computer. Upvoted but accept to Keith since I asked for PS version. Thx – FkYkko Nov 23 '09 at 15:38
  • 6
    Note also that `copy` supports wildcards. So `copy /b *.bin out.bin` will concatenate all your bin-files and the output will be very fast (i.e. much faster than with PowerShell). – Davor Josipovic Apr 13 '14 at 12:09
  • 2
    Thanks... Its about a billion times faster than the accepted anwser ;). I missed the "cmd /c" when trying to run it from PowerShell. Sometimes the old ways are still the best. – thesaint Jan 07 '15 at 15:18
5

I had a similar problem recently, where I wanted to append two large (2GB) files into a single file (4GB).

I tried to adjust the -ReadCount parameter for Get-Content, however I couldn't get it to improve my performance for the large files.

I went with the following solution:

function Join-File (
    [parameter(Position=0,Mandatory=$true,ValueFromPipeline=$true)]
    [string[]] $Path,
    [parameter(Position=1,Mandatory=$true)]
    [string] $Destination
)
{
    write-verbose "Join-File: Open Destination1 $Destination"
    $OutFile = [System.IO.File]::Create($Destination)
    foreach ( $File in $Path ) {
        write-verbose "   Join-File: Open Source $File"
        $InFile = [System.IO.File]::OpenRead($File)
        $InFile.CopyTo($OutFile)
        $InFile.Dispose()
    }
    $OutFile.Dispose()
    write-verbose "Join-File: finished"
} 

Performance:

  • cmd.exe /c copy file1+file2 File3 around 5 seconds (Best)
  • gc file1,file2 |sc file3 around 1100 seconds (yuck)
  • join-file File1,File2 File3 around 16 seconds (OK)
Keith S Garner
  • 126
  • 2
  • 2
  • cmd.exe copy is many times faster than native PS cmdlets - 1.2MB/s versus >120Mb/s. Not surprising considering how Get-Content works even with the -ReadCound parameter – Rob Nicholson Oct 23 '17 at 20:13
3

Performance is very much dependent on the buffer size used. Those are fairly small by default. Concatenating 2x2GB files I'd take a buffersize of about 256kb. Going larger might sometimes fail, smaller and you'll get less throughput than your drive is capable of.

With gc that'd be with -ReadCount not simply -Read (PowerShell 5.0):

gc -ReadCount 256KB -Path $infile -Encoding Byte | ...

Plus I found Add-Content to be better and going file-by-file for a lot of small files, because piping only a moderate amount of data (200MB) I found my computer going oom, PowerShell freezing and CPU at full.

Although Add-Content randomly fails a few times for a few hundred files with an error about the destination file being in use, so I added a while loop and a try catch:

# Empty the file first
sc -Path "$path\video.ts" -Value @() -Encoding Byte 
$tsfiles | foreach {    
    while ($true) {
        try { # I had -ReadCount 0 because the files are smaller than 256KB
            gc -ReadCount 0 -Path "$path\$_" -Encoding Byte | `
                Add-Content -Path "$path\video.ts" -Encoding Byte -ErrorAction Stop
            break;
        } catch {
        }
    }
}

Using a file stream is much faster still. You cannot specify a buffer size with [System.IO.File]::Open but you can with new [System.IO.FileStream] like so:

# $path = "C:\"
$ins = @("a.ts", "b.ts")
$outfile = "$path\out.mp4"
$out = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
    $outfile, 
    [System.IO.FileMode]::Create,
    [System.IO.FileAccess]::Write,
    [System.IO.FileShare]::None,
    256KB,
    [System.IO.FileOptions]::None)
try {
    foreach ($in in $ins) {
        $fs = New-Object -TypeName "System.IO.FileStream" -ArgumentList @(
            "$path\$in", 
            [System.IO.FileMode]::Open,
            [System.IO.FileAccess]::Read,
            [System.IO.FileShare]::Read,
            256KB,
            [System.IO.FileOptions]::SequentialScan)
        try {
            $fs.CopyTo($out)
        } finally {
            $fs.Dispose()
        }
    }
} finally {
    $out.Dispose()
}
Eric
  • 31
  • 1