On my Windows 10 PC, there are three files, 10GB each, that I want to merge via cat file_name_prefix* >> some_file.zip
. However, the output file grew as much as 38GB large before I aborted the operation via Ctrl+C. Is this expected behavior? If not, where am I making a mistake?

- 712
- 1
- 10
- 18
-
5have you looked at what `cat` does? it is an alias for `Get-Content`. the output is far more than just the lines of text in the file. plus, it aint meant for binary files at all. – Lee_Dailey Mar 14 '21 at 23:53
-
Huh, that's odd. Would like to see what someone on here has to say about this. So I'm following this post now. Also, `cat` is just an Alias to the `Get-Content` cmdlet. – Abraham Zinala Mar 14 '21 at 23:55
3 Answers
Cat
is an alias of Get-Content
which assumes text files by default - the output size is probably due to this conversion. You can try adding the -raw
switch for binary files - this might work? (not sure)
Its definitely possible to "cat" binary files together with a CMD shell using the copy command like below.
copy /b part1.bin+part2.bin+part3.bin some_file.zip
(The 3 part*.bin are the files to be combined into some_file.zip).

- 2,884
- 1
- 10
- 13
-
`-Raw` by itself does _not_ help, but there is `-Encoding Byte` (Windows PowerShell) and `-AsByteStream` (PowerShell (Core) 7+) for byte handling - see [this answer](https://stackoverflow.com/a/1783725/45375) for an example. – mklement0 Mar 15 '21 at 15:36
-
Also worth noting that in order to run your command _from PowerShell_, `cmd /c` must be prepended (`copy` is a command that is _internal_ to `cmd.exe`). – mklement0 Mar 15 '21 at 15:38
PowerShell's cat
A.K.A Get-Content
reads text file content into an array of strings by default. It also reads the file and checks for the BOM to handle encodings properly if you don't specify a charset. That means it won't work with binary files
To combine binary files in PowerShell 6+ you need to use the -AsByteStream
parameter
Get-Content -AsByteStream file_name_prefix* | `
Set-Content -AsByteStream some_file.zip # or
Get-Content -AsByteStream file1, file2, file3 | `
Set-Content -AsByteStream some_file.zip
Older PowerShell doesn't have that option so the only thing you can use is -Raw
Get-Content -Raw file_name_prefix* | Set-Content -Raw some_file.zip
However it'll be very slow because the input files are still treated as text files and read line-by-line. For speed you'll need to use other solutions, like calling Win32 APIs directly from PowerShell
Update:
As mentioned, there's only -Raw
in Get-Content
, not in Set-Content
and it's unsuitable for binary content. You need to use -Encoding Byte
Get-Content -Encoding Byte file_name_prefix* | Set-Content -Encoding Byte some_file.zip
See

- 37,963
- 15
- 156
- 475
It is probably going in a loop, recursively concatenating all files including the result to the result file (with the glob wildcard).
You can add an extension in the glob, temporarily save it as another extension and move it to the correct one. (As suggested in: https://stackoverflow.com/a/53079166/12657997)
E.g. when you have 3 files:
- a.txt with
a
inside - b.txt with
b
inside - c.txt with
c
inside
cat *.txt > res.csv ; mv res.csv res.txt
cat .\res.txt
a
b
c
Edit
This cat command (as shown above), in combination with the output redirect >
will increase the result text file as @mklement0 points out.
According to the documentation (https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.management/get-content?view=powershell-7.1):
-Encoding
Specifies the type of encoding for the target file. The default value is utf8NoBOM.
However the encoding with the output redirect changes the ecoding, as explained in this post: https://stackoverflow.com/a/40098904/12657997
To illustrate this I've converted the a.txt, b.txt and c.txt to zip files (now they are in a binary format).
cat -Encoding Byte *.zip > res.csv ; mv res.csv res2.txt
cat -Raw *.zip > res.csv ; mv res.csv res3.txt
ls .
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 15/03/2021 21:29 109 a.zip
-a---- 15/03/2021 21:29 109 b.zip
-a---- 15/03/2021 21:29 109 c.zip
-a---- 15/03/2021 21:39 2282 res2.txt
-a---- 15/03/2021 21:41 668 res3.txt
We can see that the output size doubles in size for res3.txt (for every utf-8 byte read utf-16 will output 2.
The -Encoding Byte
output, in combination with the output redirect, will make it even worse.

- 76
- 3
-
1That's definitely _one_ pitfall (+1), but even with that out of the picture, another problem is the fact that `Get-Content` by default interprets its input as _text_ and that `>` / `>>` (effective aliases of the `Out-File` cmdlet) also applies a _character encoding_ on output. In Windows PowerShell, `>` / `>>` use UTF-16LE (!; "Unicode") by default, which has the potential to double the size of the original input. – mklement0 Mar 15 '21 at 14:49
-
1@mklement0 you are right. I'll update the answer to reflect that it's for text files (how it is shown in the answer), not for binary files. – Woody Mar 15 '21 at 20:33