4

I would like to split a binary file into smaller chunks. Anyone knows a Windows command for that?

Because of Android's UNCOMPRESS_DATA_MAX constraint, I cannot overwrite the Database with a file 1MB or larger. So if there is a better way to do it I am OK with that too.

Ross Ridge
  • 38,414
  • 7
  • 81
  • 112
Xi 张熹
  • 10,492
  • 18
  • 58
  • 86
  • few ways to split a file with [batch script without external tools][1]. [1]: http://stackoverflow.com/questions/28244063/how-can-i-split-a-binary-file-into-chunks-with-certain-size-with-batch-script-wi – npocmaka Feb 02 '15 at 08:46

3 Answers3

3

Method 1:

makecab can split a binary file into smaller encoded chunks in it's own format, but they can't be treated as just raw bytes, similar to a flat binary file eg. to join via copy ie. in case you were looking at editing a binary file via CMD, eg. file patching. The chunks, however can then be joined by extrac32, in case you were just looking at splitting a file and then joining into one piece later, without editing.

eg. To split binary file with makecab then join with extrac32, first make a ddf (text) file:

.Set CabinetNameTemplate=test_*.cab; <-- Enter chunk name format
.Set MaxDiskSize=900000; <-- Enter file split/chunk size
.Set ClusterSize=1000
.Set Cabinet=on;
.Set Compress=off;
.set CompressionType=LZX;
.set CompressionMemory=21
.Set DiskDirectoryTemplate=;
file.in

Then:

rem Optional: set compression on to save disk space
makecab /f ddf.txt

To get the original file back, ensure all chunks are in the same directory:

REM join by calling 1st file in the sequence
extrac32 test_1.cab file.out

MakeCAB introduces the concept of a folder to refer to a contiguous set of compressed bytes.

"MakeCAB takes all of the files in the product or application being compressed, lays the bytes down as one continuous byte stream, compresses the entire stream, chopping it up into folders as appropriate, and then fills up one or more cabinets with the folders."

Method 2: For raw byte chunks, powershell can split files:

set size=1000000
set file=test.mp3

for %j in (%file%) do (
set /a chunks=%~zj/%size% >nul

for /l %i in (0,1,!chunks!) do (
set /a tail=%~zj-%i*%size% >nul
powershell gc %file% -Encoding byte -Tail !tail! ^| sc %file%_%i -Encoding byte
if %i lss !chunks! FSUTIL file seteof %file%_%i %size% >nul
)
)

Method 3: via certutil & CMD:

set file="x.7z"             &REM compressed to generate CRLF pairs
set max=70000000            &REM certutil has max file limit around 74MB

REM Findstr line limit 8k
REM Workaround: wrap in some archive to generate CRLF pairs

for %i in (%file%) do (
set /a num=%~zi/%max% >nul      &REM No. of chunks
set /a last=%~zi%%max% >nul     &REM size of last chunk
if %last%==0 set /a num=num-1       &REM ove zero byte chunk
set size=%~zi
)

ren %file% %file%.0

for /l %i in (1 1 %num%) do (
set /a s1=%i*%max% >nul
set /a s2="(%i+1)*%max%" >nul
set /a prev=%i-1 >nul

echo Writing %file%.%i
type %file%.!prev! | (
  (for /l %j in (1 1 %max%) do pause)>nul& findstr "^"> %file%.%i)

FSUTIL file seteof %file%.!prev! %max% >nul
)
if not %last%==0 FSUTIL file seteof %file%.%num% %last% >nul
echo Done.

Notes:

  1. Chunks can be joined by copy /b
  2. Filename extensions can be made neater by padding chunk numbers
  3. Can be looped to split entire directories

See example output below:

Directory of C:\Users\Stax\Desktop\Parking

03/05/2022  01:04    <DIR>          .
03/05/2022  01:04    <DIR>          ..
03/05/2022  01:04               407 Court Notice.pdf.000
03/05/2022  01:04             4,000 Court Notice.pdf.001
03/05/2022  01:04             4,000 Court Notice.pdf.002
03/05/2022  01:04               557 Parking fine.pdf.000
03/05/2022  01:04             4,000 Parking fine.pdf.001
03/05/2022  01:04             4,000 Parking fine.pdf.002
03/05/2022  01:04             4,000 Parking fine.pdf.003
03/05/2022  01:04             4,000 Parking fine.pdf.004
               8 File(s)         24,964 bytes

Methods 2 & 3 can then be joined by copy

Tested on Win 10

Zimba
  • 2,854
  • 18
  • 26
  • 1
    Method 2 looks useful, but it doesn't work on large files. If the file size exceeds the maximum value that fits into a System.Int32, it dies a horrible death. My test file is over 4GB. – Erick G. Hagstrom Mar 04 '22 at 20:57
  • Win CMD scripting was designed to be a text editor, but has been hacked to edit binaries therefore has numerous limitations eg. max file size, line limits, name limits, character encodings and OS calls. To overcome some of these, VBS & PowerShell were developed. For programming tasks, you'd be looking at programming or assembly language. – Zimba May 31 '23 at 05:50
0

There's no built-in DOS command for that. Use the dos port of the unix split command:

split BIGFILE -b 1000000

There are 3rd party alternatives, but this is the simplest.

Vik David
  • 3,640
  • 4
  • 21
  • 29
0

You can also install GnuWin from http://gnuwin32.sourceforge.net

For my work, I need to extract some lines from a big Oracle export's file DataBase.bak.

This file is a binary file that is a mix of text's lines and binary lines.

To extract all lines between 2 specifics lines, I only enter following to command

split -l 4114807 database.bak from.
split -l 10357 from.A to.
copy to.A database.RANGE.bak

The first command extract all lines from 0 to 4114807 into from.A file and all lines from 4114808 to 2*4114807 into from.B file.

I found FROM line's number (= 4114807) in loading Database.Bak file in Notepad++.
Caution: the line's number displayed in Notepad++ is not equal to l parameter used in split command because Notepad++ line's number are generated by LF and also CR characters !

I use the second command to extract all first 10357 lines contains in from.B file into to.A file.

To terminate, I copy to.A file into a new Database.RANGE.bak file that contains needed extraction.

When job is done, I delete all from.* and to.* files from current directory.

schlebe
  • 3,387
  • 5
  • 37
  • 50