0

So basically I'm writing an application that looks for PNG files in a binary file. It does this by reading in an entire binary in file into a byte array and then converting it to a string using the Convert.ToBase64String method and then using a regex that matches a PNG's header information and end chunk to find the images. Problem is using the ToBase64String method generates wildly different outputs depending on the length of the byte array and the documentation on MSDN doesn't seem to elaborate on it. Anyways here's an example of what I mean.

 byte[] somebytes = new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };

 Console.WriteLine(Convert.ToBase64String(somebytes));

The output in this case is "AQIDBAUGBwg=" now if I skip a byte...

 byte[] somebytes = new byte[] { 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08 };

 somebytes = somebytes.Skip(1).ToArray();

 Console.WriteLine(Convert.ToBase64String(somebytes));

The output is now "AgMEBQYHCA==" so almost every character has changed from the previous example.

So am I hopelessly following the wrong path here for regexing a binary file or is there a method (maybe by padding?) I can guarantee more consistency across these conversion?

Update: Based on the feedback I've gathered it seems I should just move away from the Regex solution and manually search for the start and end byte sequences manually myself. Not sure why I'm being downvoted as I just wanted to understand why my other solution did work and there doesn't seem to be any other posts on this topic. Anyways thanks everyone for the quick feedback. I'll post the algorithm I used for finding images when I'm done in case it might benefit someone else.

Uwe Keim
  • 39,551
  • 56
  • 175
  • 291
Thermonuclear
  • 352
  • 3
  • 18
  • 1
    Yes, you are. There's no point in doing base64 and then trying to find the header. Why not just find in the binary? Or if you must then use hexadecimal for consistency – Sami Kuhmonen Oct 27 '16 at 19:04
  • 3
    *"It does this by reading in an entire binary in file into a byte array and then converting it to a string using the Convert.ToBase64String method and then using a regex that matches a PNG's header information and end chunk to find the images."* What the...? You have a byte array, so search the byte array. – Matt Burland Oct 27 '16 at 19:06
  • 1
    As for why it changes, it changes because you are shifting the array. Base64 takes 6 bits at a time and translates in into a character. If you shift by a byte (8 bits) then you are going to get totally different characters. – Matt Burland Oct 27 '16 at 19:11
  • And for searching an array inside another array, look at [this](http://stackoverflow.com/questions/4859023/find-an-array-byte-inside-another-array) for example. – Matt Burland Oct 27 '16 at 19:13
  • Yes, I know that I could just search the byte array and get away with it that way. Regex just seemed like an easier solution when I started down this path especially when I want to add support for other image types. – Thermonuclear Oct 27 '16 at 19:18
  • 1
    @Thermonuclear This is a terrible idea, _especially_ if you want to add support for other file formats. Just search for the appropriate format signature in the byte array. – xxbbcc Oct 27 '16 at 19:33
  • Are you searching for PNGs in a EXE or DLL? If so, why not just search the resources? There are a few other places you can stick a PNG, but 99% of people will stick them in the resource sections and there are APIs to navigate through those. – SledgeHammer Oct 27 '16 at 19:38
  • Your instincts on how to perform this task is a little off. It's great you're trying things, and fine to ask for help, but *how* you ask can be important. This question sounds like "I'm doing something incredibly weird, why isn't it working out right?" People read what you're doing and it's hard to not hit that downvote. A safer way to ask this question is "I need to do this [describe requirement] I tried to do it this way [omgwtflol], but it isn't working for these reasons [list]. Is there something wrong with my code, or my approach? How can I accomplish my goal?" –  Oct 27 '16 at 19:40
  • 1
    The deleted answer was correct. You need to examine the bytes of the image, not convert it to something else and try to parse it some other way. –  Oct 27 '16 at 19:42
  • @SledgeHammer The project was started because I wanted a tool that would make it easier for me to replace PNG (or possibly other types) images in a WPF desktop application, which I've done successful by manually editing binary using a hex editor. If you know of any API's that could assist me in this I'd be very interested to know more. – Thermonuclear Oct 27 '16 at 19:43
  • @Will Thank you for your feedback, I'll take that into consideration in the future. I should have made it clear that I knew searching for the bytes manually was a possibility. – Thermonuclear Oct 27 '16 at 19:48
  • *"Yes, I know that I could just search the byte array and get away with it that way. Regex just seemed like an easier solution"*, doing it the right way (by searching the byte array) isn't "getting away with it", it's doing it right. Regex isn't going to be "easier", it's just plain wrong (at least with base64 encoding). – Matt Burland Oct 27 '16 at 20:13

2 Answers2

0

You confirmed in the comments that you are trying to pull resources from a C# structured file (EXE or DLL). You can use reflection methods to pull them out: GetManifestResourceStream, GetManifestResourceNames, GetManifestResourceInfo is a good starting point.

SledgeHammer
  • 7,338
  • 6
  • 41
  • 86
  • Thanks for that I will certainly try messing around with that. I may still continue pursuing reading the bytes manually however as ideally it would be nice if the image replacer worked on more than just .Net binaries but this information is still very useful indeed! – Thermonuclear Oct 27 '16 at 19:59
  • 1
    @Thermonuclear, native C++ binaries are also structured and have apis for messing with resources. You should not try to do this yourself as its more complicated then you think. What if the current PNG is 1000 bytes and you want to replace it with one thats 1010 bytes? You'll overwrite code / other resources. The APIs will take care of that for you. – SledgeHammer Oct 27 '16 at 20:01
  • My application was originally only going to allow image replacements on images that were either equal to or smaller than the original being replaced thus avoiding that problem at least it worked when I did it manually to a WPF application. However if these APIs give me the power to swap the image for a larger one that does give me a very compelling reason to reconsider my scope for that ability. – Thermonuclear Oct 27 '16 at 20:06
  • So having played with the 'GetManifest' reflection methods I don't see how it's possible to change the resources. I'm only seeing 'Get' methods and when I use GetManifestResourceStream to grab a stream containing an image the stream's 'CanWrite' property is 'false'. Please correct me if I'm wrong but I don't see how I can set or modify the manifest resource streams using reflection. – Thermonuclear Oct 28 '16 at 05:11
0

As promised here is the logic I've written to find the images in the binary in the event it might help someone else. However, I may ultimately use SledgeHammers method but it was important to me that I'm able to handle it using this method as well.

public class BinarySearch
{
    public static IEnumerable<byte[]> Match(byte[] source, byte[] beginningSequence, byte[] endSequence)
    {
        int index = 0;

        IList<byte[]> matches = new List<byte[]>();

        while (index < source.Length)
        {
            var startIndex = FindSequence(source, beginningSequence, index);

            if (startIndex >= 0)
            {
                var endIndex = FindSequence(source, endSequence, startIndex + beginningSequence.Length);

                if (endIndex >= 0)
                {
                    var length = (endIndex - startIndex) + endSequence.Length;
                    var buffer = new byte[length];

                    Array.Copy(source, startIndex, buffer, 0, length);

                    matches.Add(buffer);

                    index = endIndex + endSequence.Length;
                }
                else
                {
                    index = source.Length;
                }
            }
            else
            {
                index = source.Length;
            }
        }

        return matches;
    }

    private static int FindSequence(byte[] bytes, byte[] sequence, int startIndex = 0)
    {
        int currentIndex = startIndex;
        int sequenceIndex = 0;
        bool found = false;

        while (!found && currentIndex < bytes.Length)
        {
            if (bytes[currentIndex] == sequence[sequenceIndex])
            {
                if (sequenceIndex == (sequence.Length - 1))
                {
                    found = true;
                }
                else
                {
                    sequenceIndex++;
                }
            }
            else
            {
                currentIndex -= sequenceIndex;
                sequenceIndex = 0;
            }

            currentIndex++;
        }

        return found ? (currentIndex - sequence.Length) : -1;
    }
}

Here's an example of it's usage for PNG files.

var imageHeaderStart = new byte[] { 0x89, 0x50, 0x4E, 0x47, 0x0D, 0x0A, 0x1A, 0x0A, 0x00 };

var imageEOF = new byte[] { 0x00, 0x00, 0x49, 0x45, 0x4E, 0x44, 0xAE, 0x42, 0x60, 0x82 };

var matches = BinarySearch.Match(binaryData, imageHeaderStart, imageEOF);

I'll add a link to the Github project upon it's completion in case anyone is interested in my 'complete' implementation.

Thermonuclear
  • 352
  • 3
  • 18