2

I have binary files which contain each one PNG file at a time (the binary file is not a DLL, not a EXE, nothing usual, just a file which contains different textual information, a PNG file and some other stuff. The format of the file is unknown to me. The PNG file is displayable with a program which does this kind of files). I have not the source of this program which does these files. My task is now to extract this PNG file out of the binary file for displaying it or saving it as PNG. I wrote a code which works on some of these files (let's say about 50% of the files), but on anothers not. On the not working files the program which created this files can still display the containing image, so the image is inside of every file surely valid - but anyway my code doesn't work on some of the files.

Some images seem to have maybe another format, maybe encoding type (I tried already all different encoding types, nothing succeeded). Here is my code (I hope someone can tell me what to change that the image becomes readable always).

What does my code: It finds the know starting string of the PNG image "‰PNG" and the known ending string "IEND®B`‚". This strings are in any of my binary files containing the PNG's the same. Then my code takes the string between start and end + the start and the end sequence and saves it to a file with Encoding.Default. Most by this way extracted PNG files can be displayed with an Image Viewer, but around 50% are invalid. The image looks okay if I open it with an editor and compare the characters to a working image. Sofar I have no clue which symbol is the reason for the wrong image format.

If needs I'll provide more information, here my code:

private void button2_Click(object sender, EventArgs e)
    {
        string ReadFile1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "file.dat");
        string WriteFile1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "test.png");
        string TMP = File.ReadAllText(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), ReadFile1), Encoding.Default); //System.Text.Encoding.GetEncoding(1251)
        int start1 = TMP.IndexOf("PNG", 0 ,StringComparison.Ordinal);
        if (start1 == 0) { return; }
        int end1 = TMP.IndexOf("IEND", StringComparison.Ordinal);
        string PNG = TMP.Substring(start1 - 1, (end1 + 9) - start1);
        File.WriteAllText(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "test.png"), PNG, Encoding.Default);
    }

I also thought first of getting the PNG with a binary method and used this code, but I had exactly the same results then with just reading the string. Here my earlier code. I seeked the position in the byte array using the string to compare. I had no luck with the binary code...

 byte[] by;
        // 1.
        // Open file with a BinaryReader.
        using (BinaryReader b = new BinaryReader(File.Open(ReadFile1, FileMode.Open), Encoding.Default))
        {
            // 2.
            // Variables for our position.
            int pos = start1 - 1;           //I determine the right positions before doing this
            int required = (end1 + 9) - start1; 

            // 3.
            // Seek to our required position.
            b.BaseStream.Seek(pos, SeekOrigin.Begin);

            // 4.
            // Read the next 2000 bytes.
            by = b.ReadBytes(required);
            b.Close();
        }

        FileStream writeStream;
        writeStream = new FileStream(WriteFile1, FileMode.Create);
        BinaryWriter writeBinay = new BinaryWriter(writeStream, Encoding.Default);
        writeBinay.Write(by);
        writeBinay.Close(); */
feedwall
  • 1,473
  • 7
  • 28
  • 48

3 Answers3

5

You should not be reading the file as a text file; transformations may occur on the contents. You should instead try using File.ReadAllBytes, and then search for the byte sequences of the start and end of the PNG file, and then write out that region of bytes.

To find a sequence of bytes in a byte array, you can use code like the following:

private static int IndexOf(byte[] array, byte[] sequence, int startIndex)
{
    if (sequence.Length == 0)
        return -1;

    int found = 0;
    for (int i = startIndex; i < array.Length; i++)
    {
        if (array[i] == sequence[found])
        {
            if (++found == sequence.Length)
            {
                return i - found + 1;
            }
        }
        else
        {
            found = 0;
        }
    }

    return -1;
}

private void button2_Click(object sender, EventArgs e) 
{ 
    string ReadFile1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "file.dat"); 
    string WriteFile1 = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), "test.png"); 

    byte[] TMP = File.ReadAllBytes(ReadFile1);

    byte[] pngStartSequence = new byte[] { 0x89, 0x50, 0x4E, 0x47 };
    byte[] pngEndSequence = new byte[] { 0x49, 0x46, 0x4E, 0x44 };

    int start1 = IndexOf(TMP, pngStartSequence, 0);
    if (start1 == -1)
    {
       // no PNG present
       MessageBox.Show("Could not find PNG header");
       return;
    }

    int end1 = IndexOf(TMP, pngEndSequence, start1 + pngStartSequence.Length);
    if (end1 == -1)
    {
       // no IEND present
       MessageBox.Show("Could not find PNG footer");
       return;
    }

    int pngLength = end1 - start1 + 8;
    byte[] PNG = new byte[pngLength];

    Array.Copy(TMP, start1, PNG, 0, pngLength);

    File.WriteAllBytes(WriteFile1, PNG); 
} 
Monroe Thomas
  • 4,962
  • 1
  • 17
  • 21
  • @feedwall you need to add the function `IndexOf` from the first block of code above to your file. I've made some small changes, try again. – Monroe Thomas Jul 14 '12 at 22:52
  • @feedwall Thanks for catching that; I've updated the answer. It could be that the program is doing some kind of encoding on the PNG file before it gets stored, and reverses that encoding when it reads it in. This would be hard to reverse engineer, though. – Monroe Thomas Jul 14 '12 at 23:28
  • @feedwall I think that if the string PNG appears multiple times in the file, then maybe you are getting the wrong thing. Since we are now comparing binary instead of text, we should compare all of the header and footer bytes exactly. I have updated my answer to include the byte repesentations of the start and end sequences. – Monroe Thomas Jul 15 '12 at 01:42
  • @feedwall Since we are including the special character before PNG in our search, we no longer use `start1 - 1` in `Array.Copy`. – Monroe Thomas Jul 15 '12 at 01:50
  • @feedwall ... and the length calculation is slightly different. – Monroe Thomas Jul 15 '12 at 04:26
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/13909/discussion-between-monroe-thomas-and-feedwall) – Monroe Thomas Jul 15 '12 at 16:25
  • @feedwall I changed the footer to be less specific; and the length calculation has changed again as a result. – Monroe Thomas Jul 15 '12 at 16:30
5

PNG files are binary. If you read them using some encoding, you'll loose information and the output of your program is not a valid PNG file any more. Refer to Using Chunks in a PNG for more explanation and code samples.

Also read PNG Specifiaction: File structure for detailed information.

Community
  • 1
  • 1
Arne
  • 2,106
  • 12
  • 9
  • @feedwall the issue is that you read binary data as text using an encoding scheme. If that is not enough explanation to you you have to search and read about "character encoding". – Arne Jul 15 '12 at 17:23
  • That's the point: encoding doesn't matter here, but you use it: `File.ReadAllText(Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.DesktopDirectory), ReadFile1), Encoding.Default)` The last parameter defines that you wan't to read the files content in to text using the default encoding - whatever that is in your application at that specific moment. PNG-24 are not text files, they are binary and should be handled as such. Start out with the Monroe Thomas's answer and read the PNG Specification carefully. If your files have been processed as text files before - change that too. – Arne Jul 15 '12 at 18:05
1

Use File.ReadAllBytes and File.WriteAllBytes. Reading and Writing as text may effected by encoding.

You can use Jb Evain algorithm for finding a pattern in Byte Array like this:

static void Main()
{
    // PNG file signature
    var startPattern = new byte[] { 137, 80, 78, 71, 13, 10, 26, 105 };
    var data = File.ReadAllBytes("png file");

    var start = data.Locate(startPattern);
    // and end like this
}    

public static int[] Locate(this byte[] self, byte[] candidate)
{
    if (IsEmptyLocate(self, candidate))
        return Empty;

    var list = new List<int>();

    for (int i = 0; i < self.Length; i++)
    {
        if (!IsMatch(self, i, candidate))
            continue;

        list.Add(i);
    }

    return list.Count == 0 ? Empty : list.ToArray();
}

static bool IsMatch(byte[] array, int position, byte[] candidate)
{
    if (candidate.Length > (array.Length - position))
        return false;

    for (int i = 0; i < candidate.Length; i++)
        if (array[position + i] != candidate[i])
            return false;

    return true;
}

static readonly int[] Empty = new int[0];

static bool IsEmptyLocate(byte[] array, byte[] candidate)
{
    return array == null
            || candidate == null
            || array.Length == 0
            || candidate.Length == 0
            || candidate.Length > array.Length;
}
Community
  • 1
  • 1
Ria
  • 10,237
  • 3
  • 33
  • 60