Base64 Encode a PDF in C#?

Question

Can someone provide some light on how to do this? I can do this for regular text or byte array, but not sure how to approach for a pdf. do i stuff the pdf into a byte array first?

Why should a PDF be any different than a byte array? – Can Berk Güder Jan 24 '09 at 03:32 — Can Berk Güder, Jan 24 '09 at 03:32

score 63 · Accepted Answer · edited Feb 07 '23 at 01:15

63

Use File.ReadAllBytes to load the PDF file, and then encode the byte array as normal using Convert.ToBase64String(bytes).

 Byte[] fileBytes = File.ReadAllBytes(@"TestData\example.pdf");
 var content = Convert.ToBase64String(fileBytes);

edited Feb 07 '23 at 01:15

Michael Freidgeim

26,542
16
152
170

answered Jan 24 '09 at 03:23

Andrew Rollings

14,340
7
51
50

Indeed. But these days machines have a lot of memory. And if necessary, reading buffered blocks from a file is a pretty standard technique :) – Andrew Rollings Jan 24 '09 at 04:52
Works great for what I need for the moment. Thanks for the tip! – Tone Jan 26 '09 at 06:22
5

this is very wasteful of memory. a stream based approach would be better. the crypto based approach suggested by JMarsch is likely more efficient. you could also do it by reading a small number of bytes at a time (multiples of 3, I would guess) and encoding them independently, writing them to the stream where you need them. – Sebastian Good Feb 12 '10 at 20:42
See my previous comment. It's not hard to buffer it. – Andrew Rollings Feb 13 '10 at 14:43
22

Also, the KISS principle applies. Why make a solution more complex than it needs to be. If the above suits his purpose (which he says it does) then why make it more complex? 2 lines of c# versus 30? – Andrew Rollings Feb 16 '10 at 23:07
I would say that which way to go really depends upon the situation. If the RAM impact of doing it in memory is acceptable for your needs, then keep it simple. On the other hand, if RAM is an issue (maybe it's a large file, or maybe it's a server process that might be processing 1000's of simultaneous requests), then the extra code is worth it. For what it's worth, I can't find any way to make the Encode method add up to 30 lines. If I count it the same way as the simple method, it comes out to 10. So 2 lines vs. 10, I think is more accurate. – JMarsch Jun 25 '11 at 16:04
I like both approaches. I love the simplicity of this approach and the versatility of JMarsch's approach. My application requires coding the contents of a (relatively) small pdf into a web service request, so I will use Andrew's approach. – Mark Ainsworth Jul 25 '16 at 20:32

JMarsch · Answer 2 · 2009-03-29T00:42:34.187

There is a way that you can do this in chunks so that you don't have to burn a ton of memory all at once.

.Net includes an encoder that can do the chunking, but it's in kind of a weird place. They put it in the System.Security.Cryptography namespace.

I have tested the example code below, and I get identical output using either my method or Andrew's method above.

Here's how it works: You fire up a class called a CryptoStream. This is kind of an adapter that plugs into another stream. You plug a class called CryptoTransform into the CryptoStream (which in turn is attached to your file/memory/network stream) and it performs data transformations on the data while it's being read from or written to the stream.

Normally, the transformation is encryption/decryption, but .net includes ToBase64 and FromBase64 transformations as well, so we won't be encrypting, just encoding.

Here's the code. I included a (maybe poorly named) implementation of Andrew's suggestion so that you can compare the output.


    class Base64Encoder
    {
        public void Encode(string inFileName, string outFileName)
        {
            System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.ToBase64Transform();
            using(System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
                                      outFile = System.IO.File.Create(outFileName))
            using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(outFile, transform, System.Security.Cryptography.CryptoStreamMode.Write))
            {
                // I'm going to use a 4k buffer, tune this as needed
                byte[] buffer = new byte[4096];
                int bytesRead;

                while ((bytesRead = inFile.Read(buffer, 0, buffer.Length)) > 0)
                    cryptStream.Write(buffer, 0, bytesRead);

                cryptStream.FlushFinalBlock();
            }
        }

        public void Decode(string inFileName, string outFileName)
        {
            System.Security.Cryptography.ICryptoTransform transform = new System.Security.Cryptography.FromBase64Transform();
            using (System.IO.FileStream inFile = System.IO.File.OpenRead(inFileName),
                                      outFile = System.IO.File.Create(outFileName))
            using (System.Security.Cryptography.CryptoStream cryptStream = new System.Security.Cryptography.CryptoStream(inFile, transform, System.Security.Cryptography.CryptoStreamMode.Read))
            {
                byte[] buffer = new byte[4096];
                int bytesRead;

                while ((bytesRead = cryptStream.Read(buffer, 0, buffer.Length)) > 0)
                    outFile.Write(buffer, 0, bytesRead);

                outFile.Flush();
            }
        }

        // this version of Encode pulls everything into memory at once
        // you can compare the output of my Encode method above to the output of this one
        // the output should be identical, but the crytostream version
        // will use way less memory on a large file than this version.
        public void MemoryEncode(string inFileName, string outFileName)
        {
            byte[] bytes = System.IO.File.ReadAllBytes(inFileName);
            System.IO.File.WriteAllText(outFileName, System.Convert.ToBase64String(bytes));
        }
    }

I am also playing around with where I attach the CryptoStream. In the Encode method,I am attaching it to the output (writing) stream, so when I instance the CryptoStream, I use its Write() method.

When I read, I'm attaching it to the input (reading) stream, so I use the read method on the CryptoStream. It doesn't really matter which stream I attach it to. I just have to pass the appropriate Read or Write enumeration member to the CryptoStream's constructor.

I haven't ran and verified this, but this looks promisingly good and awesome. Cool idea! +1 — codingbear, Jun 16 '10 at 11:59

Base64 Encode a PDF in C#?

2 Answers2

Linked