In Powershell, How do I split a large binary file?

Question

I've seen the answer elsewhere for text files, but I need to do this for a compressed file.

I've got a 6G binary file which needs to be split into 100M chunks. Am I missing the analog for unix's "head" somewhere?

this looks interesting: http://blogs.sans.org/windows-security/2010/02/11/powershell-byte-array-hex-convert/ — djsadinoff, Dec 26 '10 at 11:06
this too http://stackoverflow.com/questions/888063/powershell-to-get-the-first-x-mb-of-a-file — bernd_k, Dec 26 '10 at 11:11

djsadinoff · Accepted Answer · 2014-04-27T05:02:34.193

26

Never mind. Here you go:

function split($inFile,  $outPrefix, [Int32] $bufSize){

  $stream = [System.IO.File]::OpenRead($inFile)
  $chunkNum = 1
  $barr = New-Object byte[] $bufSize

  while( $bytesRead = $stream.Read($barr,0,$bufsize)){
    $outFile = "$outPrefix$chunkNum"
    $ostream = [System.IO.File]::OpenWrite($outFile)
    $ostream.Write($barr,0,$bytesRead);
    $ostream.close();
    echo "wrote $outFile"
    $chunkNum += 1
  }
}

Assumption: bufSize fits in memory.

edited Apr 27 '14 at 05:02

answered Dec 26 '10 at 12:39

djsadinoff

5,519
6
33
40

why do we need `$stream.seek`? The Read method automatically sets the current position, right? – Samik Apr 24 '14 at 12:14
You're probably right, @Samik. If you can test it to ensure that it works, I'll remove the line of code. – djsadinoff Apr 25 '14 at 07:00
Yes, I commented out the three lines involving $curOffset and it worked just as well. As I am using this script to split a text file, I had to add a few lines of code, so that it does not break in the middle of a line. Anyway, thanks for the code. – Samik Apr 26 '14 at 02:20

score 18 · Answer 2 · answered Jul 10 '14 at 22:50

The answer to the corollary question: How do you put them back together?

function stitch($infilePrefix, $outFile) {

    $ostream = [System.Io.File]::OpenWrite($outFile)
    $chunkNum = 1
    $infileName = "$infilePrefix$chunkNum"

    $offset = 0

    while(Test-Path $infileName) {
        $bytes = [System.IO.File]::ReadAllBytes($infileName)
        $ostream.Write($bytes, 0, $bytes.Count)
        Write-Host "read $infileName"
        $chunkNum += 1
        $infileName = "$infilePrefix$chunkNum"
    }

    $ostream.close();
}

Keith Hill · Answer 3 · 2010-12-26T20:39:41.607

1

I answered the question alluded to in this question's comments by bernd_k but I would use -ReadCount in this case instead of -TotalCount e.g.

Get-Content bigfile.bin -ReadCount 100MB -Encoding byte

This causes Get-Content to read a chunk of the file at a time where the chunk size is either a line for text encodings or a byte for byte encoding. Keep in mind that when it does this, you get an array passed down the pipeline and not individual bytes or lines of text.

edited Dec 26 '10 at 20:39

answered Dec 26 '10 at 20:20

Keith Hill

194,368
42
353
369

...right, and then you need to figure out a way to get each chunk into a different file. The Jason Fossen link above recommends against manipulating large sets of data with get-content: "performance of get-content is horrible with large files. Unless you are reading less than 200KB, don’t use get-content..." Is that your experience? – djsadinoff Dec 28 '10 at 08:28
Also, can you express this as a complete solution akin to mine above? – djsadinoff Dec 28 '10 at 08:30
1

Got a chance to try this on a huge file and yeah, unless you've got a 64-bit PowerShell, forget about it. :-) I've had pretty good luck with read counts of 1KB but getting Get-Content to parcel it up into chunks of 100MB just doesn't scale. Too bad PowerShell can't handle this a bit more directly. – Keith Hill Dec 31 '10 at 01:51

In Powershell, How do I split a large binary file?

3 Answers3

Linked