I have multiple files of text that I need to read and combine into one file.
The files are of varying size: 1 - 50 MB each.
What's the most efficient way to combine these files without bumping into the dreading System.OutofMemoryException
?
Asked
Active
Viewed 3.7k times
21

Soner Gönül
- 97,193
- 102
- 206
- 364

Dave Harding
- 1,280
- 2
- 16
- 31
-
1Can you describe 'Combine' ? And what is in those files? Just lines of text or CSV or XML or ... – H H Jun 10 '11 at 19:49
-
What kind of combining are you needing to do? If you're just, say, merge-sorting a bunch of sorted files, you won't need to read the whole files into memory, but can just process them line-by-line. – C. K. Young Jun 10 '11 at 19:49
-
5from a command prompt: copy
targefile.text – Muad'Dib Jun 10 '11 at 19:50 -
1Yeah... copy file1.txt + file2.txt + file3.txt allfiles.txt – agent-j Jun 10 '11 at 20:05
-
There's a previous discussion of this topic here http://stackoverflow.com/questions/444309/what-would-be-the-fastest-way-to-concatenate-three-files-in-c. Looks like that's a nice approach that will not use as much RAM as looping `ReadAllText` then `WriteAllText`. – Steve Townsend Jun 10 '11 at 20:12
-
4`copy *.txt allfiles.txt` – Lee Englestone Feb 04 '13 at 09:17
4 Answers
25
Do it in chunks:
const int chunkSize = 2 * 1024; // 2KB
var inputFiles = new[] { "file1.dat", "file2.dat", "file3.dat" };
using (var output = File.Create("output.dat"))
{
foreach (var file in inputFiles)
{
using (var input = File.OpenRead(file))
{
var buffer = new byte[chunkSize];
int bytesRead;
while ((bytesRead = input.Read(buffer, 0, buffer.Length)) > 0)
{
output.Write(buffer, 0, bytesRead);
}
}
}
}

Darin Dimitrov
- 1,023,142
- 271
- 3,287
- 2,928
-
I have to run to a meeting and might not be able to test for a bit, but I'll get back to you ASAP! - Thanks – Dave Harding Jun 10 '11 at 19:52
-
1The repeated reallocation of, and data copy to, `actual` is redundant. Just write out the number of bytes you know you read (per `bytesread`) directly from `buffer` to the new file. `buffer` itself also only needs to be allocated once, before entering the outer `for` loop. – Steve Townsend Jun 10 '11 at 20:17
-
@Steve Townsend, very good point. I've updated my post to take it into account. – Darin Dimitrov Jun 10 '11 at 21:52
-
Darin, thanks. Much appreciated. 10 files and it doesn't even break a sweat. – Dave Harding Jun 13 '11 at 16:56
-
@DarinDimitrov does this handle unicode files too? what if two files are in a different format? – Baz1nga May 11 '12 at 12:48
-
@Baz1nga It copies things as is, so the encoding doesn't matter. If the files have different encodings though, the resulting file will not be properly displayed by a normal editor. – user276648 Mar 30 '17 at 04:01
-
@DarinDimitrov Slight improvement: do `new byte[chunkSize]` only once instead of for each file, and use `chunkSize` instead of `buffer.length`. – user276648 Mar 30 '17 at 04:02
-
@DarinDimitrov would merging it in memory in parallel (using memory stream) and then writing to the disk, make it any faster? – loneshark99 May 03 '17 at 23:55
23
Darin is on the right track. My tweak would be:
using (var output = File.Create("output"))
{
foreach (var file in new[] { "file1", "file2" })
{
using (var input = File.OpenRead(file))
{
input.CopyTo(output);
}
}
}

n8wrl
- 19,439
- 4
- 63
- 103
-
`CopyTo` is a nice one but it's probably worth mentioning that it's only available in .NET 4.0. – Darin Dimitrov Jun 10 '11 at 19:55
-
-
-
1
-
Yes, in my case i have two files "file.Docx" and "file_Information.Xml" i want the application A for example to merge the two in one single file "file.QAF"... then pass this file to another application B to recover the two files "file.Docx" and "file_Information.Xml" (the way back...) – KADEM Mohammed Dec 07 '13 at 15:29
-
@CarterNolan just make a note of the length of each file (i.e. input.Length) and then pass it on to application B. Inside application B when writing with FileStream.Write, set offset to the starting byte of each file and count to the number of bytes to write. – user797717 Jul 20 '15 at 00:11
1
This is code used above for .Net 4.0, but compatible with .Net 2.0 (for text files)
using (var output = new StreamWriter("D:\\TMP\\output"))
{
foreach (var file in Directory.GetFiles("D:\\TMP", "*.*"))
{
using (var input = new StreamReader(file))
{
output.WriteLine(input.ReadToEnd());
}
}
}
Please note that this will read the entire file in memory at once. This means that large files will cause a lot of memory to be used (and if not enough memory is available, it may fail all together).

Jasper
- 11,590
- 6
- 38
- 55

user2606127
- 27
- 1
-
2-1: This won't work for big files (cause `ReadToEnd()` will create a string in memory. – Oliver Jul 22 '13 at 11:41