How to Merge two memory streams containing PDF file's data into one

Question

I am trying to read two PDF files into two memory streams and then return a stream that will have both stream's data. But I don't seem to understand what's wrong with my code.

Sample Code:

string file1Path = "Sampl1.pdf";
string file2Path = "Sample2.pdf";
MemoryStream stream1 = new MemoryStream(File.ReadAllBytes(file1Path));
MemoryStream stream2 = new MemoryStream(File.ReadAllBytes(file2Path));
stream1.Position = 0;
stream1.Copyto(stream2);
return stream2;   /*supposed to be containing data of both stream1 and stream2 but contains data of stream1 only*/

possible duplicate of [How to merge two memory streams?](http://stackoverflow.com/questions/15655210/how-to-merge-two-memory-streams) — Paul Zahra, Aug 25 '15 at 12:49
possible duplicate of [Merge memorystreams to one itext document](http://stackoverflow.com/questions/8385690/merge-memorystreams-to-one-itext-document) — Chris Haas, Aug 26 '15 at 13:03

score 11 · Accepted Answer · edited Aug 26 '15 at 13:43

11

It appears in case of PDF files, the merging of memorystreams is not the same as with .txt files. For PDF, you need to use some .dll as I used iTextSharp.dll (available under the AGPL license) and then combine them using this library's functions as follows:

MemoryStream finalStream = new MemoryStream();
PdfCopyFields copy = new PdfCopyFields(finalStream);
string file1Path = "Sample1.pdf";
string file2Path = "Sample2.pdf";

var ms1 = new MemoryStream(File.ReadAllBytes(file1Path));
ms1.Position = 0;
copy.AddDocument(new PdfReader(ms1));
ms1.Dispose();

var ms2 = new MemoryStream(File.ReadAllBytes(file2Path));
ms2.Position = 0;
copy.AddDocument(new PdfReader(ms2));
ms2.Dispose();
copy.Close();

finalStream contains the merged pdf of both ms1 and ms2.

edited Aug 26 '15 at 13:43

Bruno Lowagie

75,994
9
109
165

answered Aug 26 '15 at 11:55

ArslanIqbal

569
1
6
19

1

This should really be the default when thinking about merging files - it's a special case when this works, not when it doesn't. There's very few file formats that can be merged just by gluing two files together. In fact, even the text files don't fit perfectly - for example if you're using endlines to split the data (you'd need to put the separator between the files as well), or if they're using UTF-8 encoding with BOM (effectively giving them a "header" of sorts), or even if they are in two different encodings. – Luaan Aug 26 '15 at 13:21
3

I have updated your answer by changing *"freely avaible on internet"* to *"available under the AGPL license"*. iTextSharp is licensed software, this means that it can only be used for free in projects that are also released under the AGPL (and not under a commercial license). As soon as you use iTextSharp in a commercial context, you have to buy a commercial license for your use of iTextSharp. – Bruno Lowagie Aug 26 '15 at 13:45
@Luaan. You must be right. I am new to programming and this is the solution I reached to. I thought it would be good to post my answer for the people looking for similar issue. – ArslanIqbal Aug 26 '15 at 14:01
@Bruno Lowagie Thank you. :) – ArslanIqbal Aug 26 '15 at 16:31

Luaan · Answer 2 · 2015-08-26T13:17:48.303

5

NOTE:

The whole question is based on a false premise, that you can produce a combined PDF file by merging the binaries of two PDF files. This works for plain text files for example (to an extent), but definitely doesn't work for PDFs. The answer only addresses how to merge two binary data streams, not how to merge two PDF files in particular. It answers the OP's question as asked, but doesn't actually solve his problem.

When you use the byte[] constructor for MemoryStream, the memory stream will not expand as you add more data. So it will not be big enough for both stream1 and stream2. Also, the position will start at zero, so you're overwriting stream2 with the data in stream1.

The fix is rather simple:

var result = new MemoryStream();
using (var file1 = File.OpenRead(file1Path)) file1.CopyTo(result);
using (var file2 = File.OpenRead(file2Path)) file2.CopyTo(result);

Another option would be to create your own stream class that would be a combination of two separate streams - interesting if you're interested in composability, but probably an overkill for something as simple as this :)

Just for fun, it could look something like this:

public class DualStream : Stream
{
    private readonly Stream _first;
    private readonly Stream _second;

    public DualStream(Stream first, Stream second)
    {
        _first = first;
        _second = second;
    }

    public override bool CanRead => true;
    public override bool CanSeek => true;
    public override bool CanWrite => false;
    public override long Length => _first.Length + _second.Length;

    public override long Position
    {
        get { return _first.Position + _second.Position; }
        set { Seek(value, SeekOrigin.Begin); }
    }

    public override void Flush() { throw new NotImplementedException(); }

    public override int Read(byte[] buffer, int offset, int count)
    {
        var bytesRead = _first.Read(buffer, offset, count);

        if (bytesRead == count) return bytesRead;

        return bytesRead + _second.Read(buffer, offset + bytesRead, count - bytesRead);
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
        // To simplify, let's assume seek always works as if over one big MemoryStream
        long targetPosition;

        switch (origin)
        {
            case SeekOrigin.Begin: targetPosition = offset; break;
            case SeekOrigin.Current: targetPosition = Position + offset; break;
            case SeekOrigin.End: targetPosition = Length - offset; break;
            default: throw new NotSupportedException();
        }

        targetPosition = Math.Max(0, Math.Min(Length, targetPosition));

        var firstPosition = Math.Min(_first.Length, targetPosition);
        _first.Position = firstPosition;
        _second.Position = Math.Max(0, targetPosition - firstPosition);

        return Position;
    }

    protected override void Dispose(bool disposing)
    {
        if (disposing)
        {
            _first.Dispose();
            _second.Dispose();
        }

        base.Dispose(disposing);
    }

    public override void SetLength(long value) 
      { throw new NotImplementedException(); }
    public override void Write(byte[] buffer, int offset, int count) 
      { throw new NotImplementedException(); }
}

The main benefit is that it means you don't have to allocate unnecessary in-memory buffers just to have a combined stream - it can even be used with the file streams directly, if you dare :D And it's easily composable - you can make dual streams of other dual streams, allowing you to chain as many streams as you want together - pretty much the same as IEnumerable.Concat.

edited Aug 26 '15 at 13:17

answered Aug 25 '15 at 12:45

Luaan

62,244
7
97
116

Then what should I do? – ArslanIqbal Aug 25 '15 at 12:48
Don't use the byte constructor :D just use new MemoryStream() – Paul Zahra Aug 25 '15 at 12:49
And wasn't setting position to 0 for stream1 supposed to tell from where the copying should start? – ArslanIqbal Aug 25 '15 at 12:50
@Arsal Yes, that's fine - the problem is that `stream1` is *also* at position zero, so when you start copying, it writes all over the existing data in `stream2`. You'd have to use `stream2.Seek(SeekOrigin.End, 0)` first - and then you'd get the error about not being able to expand a memory stream created with the `byte[]` constructor. – Luaan Aug 25 '15 at 12:51
@Luaan The solution you suggested is not working. Its just copying the file1 data. – ArslanIqbal Aug 25 '15 at 12:57
@Arsal Then you must have another problem in your code - it works perfectly with the exact code I provided. Are you sure you're passing the correct paths, using the correct `file1Path` vs `file2Path` etc.? – Luaan Aug 25 '15 at 13:03
Yes I am sure. If I comment the file1 line, it will copy the file2. But if I un-comment it, it then only copies the file1 data. Do the size of files matter? – ArslanIqbal Aug 25 '15 at 13:15
@Arsal Not as long as it fits into memory. Are you sure that the problem appears even if you make a separate project just with those three lines of code, copied directly from here? – Luaan Aug 25 '15 at 13:47
@Arsal And of course, setting `result.Position = 0;` before returning the stream if you use it as a stream later (I just do `result.ToArray`, where it isn't necessary). – Luaan Aug 25 '15 at 13:51
@Luaan Yes I created another separate project and wrote your code. file1 contents are being overwritten by file2. – ArslanIqbal Aug 25 '15 at 14:17
1

@Arsal Are you sure it's actually overwritten? Have you checked the stream length? How are you reading the stream afterwards? It's not like you can just glue two PDF files together byte-by-byte and get a combined file as a result. Try using the same code with two text files, that should make the result more obvious. – Luaan Aug 25 '15 at 14:44
@Luaan. Yes, you are right. Its working perfectly with text files. So it means the memorystream is not large enough to contain the pdf file? – ArslanIqbal Aug 25 '15 at 17:26
@Arsal What are you actually expecting to happen when you join the two PDF files in one stream? What are you doing with that stream? – Luaan Aug 25 '15 at 17:55
I am rendering that stream to browser to display for printing – ArslanIqbal Aug 25 '15 at 18:01
@Arsal Well, as I've noted before - you can't just slap two PDF files together byte-by-byte and expect them to work. If you need to print both PDFs consecutively, you'll need some library that can join them for you. Appending streams directly isn't going to work. – Luaan Aug 25 '15 at 20:43
@Luaan. I figured out the same. Thanks for your time. You have been a great help. – ArslanIqbal Aug 26 '15 at 05:26
Although this answer is correct for the question, as pointed out by both @Luaan and the OP the actual question (or the assumption within) is incorrect. Because this answer might lead people who don't read comments and rely on up-vote counts to believe you can just staple two binary files together to get one big file, I'd recommend that this answer either gets edited or deleted. – Chris Haas Aug 26 '15 at 13:08
@ChrisHaas I've added a note at the top. If the question is changed to explicitly be about joining the two PDF files, I'll simply delete it. As the OP also had an error in the simple stream merging, I'm inclined to keep it for the time being if someone searches for "how to merge two streams". – Luaan Aug 26 '15 at 13:19

How to Merge two memory streams containing PDF file's data into one

2 Answers2

Linked