0

My application have an auto update feature. To verify if it successfully download the file I compare two hash, one to the xml and to the hash generated after downloading. The two hash is same but its throwing me that the two hash not same. When I check the size, xml hash string have 66 and the other is 36. I use the trim method but still no luck.

       string file = ((string[])e.Argument)[0];
       string updateMD5 = "--"+((string[])e.Argument)[1].ToUpper()+"--";
       string xx="--"+Hasher.HashFile(file, HashType.MD5).ToUpper()+"--";
        // Hash the file and compare to the hash in the update xml
       int xxx = (updateMD5.Trim()).Length;
       int xxxxx = xx.Trim().Length;
       if (String.Equals(updateMD5.Trim(), xx.Trim(), StringComparison.InvariantCultureIgnoreCase))
            e.Result = DialogResult.OK;
        else
            e.Result = DialogResult.No;

hasher code

    internal static string HashFile(string filePath, HashType algo)
    {
        switch (algo)
        {
            case HashType.MD5:
                return MakeHashString(MD5.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
            case HashType.SHA1:
                return MakeHashString(SHA1.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
            case HashType.SHA512:
                return MakeHashString(SHA512.Create().ComputeHash(new FileStream(filePath, FileMode.Open)));
            default:
                return "";
        }
    }

    private static string MakeHashString(byte[] hash)
    {
        StringBuilder s = new StringBuilder();

        foreach (byte b in hash)
            s.Append(b.ToString("x2").ToLower());

        return s.ToString();
    }

NOTE: I use the '--' to check if there are trailing space enter image description here

    StringBuilder s=new StringBuilder(); 
        foreach (char c in updateMD5.Trim())
            s.AppendLine(string.Format("{0}=={1}",c,(int)c));

enter image description here

Vic
  • 457
  • 1
  • 6
  • 23
  • 3
    Please include a [mcve] as text – Sayse Nov 07 '16 at 08:53
  • 2
    Clearly there is a mismatch between the things you're showing here, the string on the left is clearly not 66 characters long. Please ensure you're looking at the right things. – Lasse V. Karlsen Nov 07 '16 at 08:59
  • Custom class "Hasher"? Please provide your function too. – Prisoner Nov 07 '16 at 09:00
  • @Alex I already include my hasher – Vic Nov 07 '16 at 09:03
  • @LasseV.Karlsen Im sure that I am in right code that causing error. Us you can see, the string is equal but in size they are not same. – Vic Nov 07 '16 at 09:08
  • You must be missing somthing. Obviously, if `updateMD5` is what you show in your image, its length is not 66. – Pikoh Nov 07 '16 at 09:10
  • Can you try this code `int xxx = new StringInfo((updateMD5.Trim())).LengthInTextElements;` – Prisoner Nov 07 '16 at 09:10
  • The 36-string is likely the right one, 32 hex characters = 16 bytes (MD5 size) + 4 (2 sets of `--`, one at each end). The other is wrong but I doubt that the debugger or you is correct in that the string shown gives the length shown. – Lasse V. Karlsen Nov 07 '16 at 09:10
  • Part of the answer is here: http://stackoverflow.com/questions/26975736/why-is-the-length-of-this-string-longer-than-the-number-of-characters-in-it basically you need to convert both your strings to use the same codepage – Thomas K Nov 07 '16 at 09:11
  • try dumping each char in both string, something like this:, `foreach (char c in updateMD5.Trim())Console.WriteLine("{0}-{1}", c, (int) c);` I suspect you have some not printable character in your strings – Gian Paolo Nov 07 '16 at 09:12
  • 1
    Given the code shown, the only explanation is that `updateMD5` contains invisible characters. Try dumping it out as individual character codes to check. Also try `updateMD5 = updateMD5.Replace("\0", "");` I'm guessing that it will have a bunch of nul characters because it's been stored as Unicode and read in as something else. – Matthew Watson Nov 07 '16 at 09:13
  • @Alex It still 66. – Vic Nov 07 '16 at 09:14
  • @MatthewWatson still throwing that the UpdateMD5 have 66 character – Vic Nov 07 '16 at 09:16
  • What happens if you use `.Count()` (including Linq) instead of `.Length` – Matteo Umili Nov 07 '16 at 09:23
  • @GianPaolo pls see the updated question, ill already attach the ouptut. I notice, there are line that are empty. – Vic Nov 07 '16 at 09:32
  • @VictorBaccal - Please provide enough code that we can copy, paste & run to see the results that you are seeing. – Enigmativity Nov 07 '16 at 09:38
  • Then it seems @MatthewWatson is correct, you should check your input value, and remove those invisible character – Prisoner Nov 07 '16 at 09:39
  • Where do you get your `e.Argument` from? Apparently, it has a bunch of invisible zero-width characters. And do a bit of code cleanup -- variables should not be named `xxx`, `xxxx` – Anton Gogolev Nov 07 '16 at 09:40
  • @AntonGogolev the e.argument from the pass value. the first code above is backgroundworker. – Vic Nov 07 '16 at 09:42
  • Try to clean up your string with something like this: `updateMD5= new string(updateMD5.Where(c => char.IsLetterOrDigit(c)).ToArray());` – Pikoh Nov 07 '16 at 09:43
  • 1
    Apparently you have a [`ZERO WIDTH NON-JOINER`](http://www.unicodemap.org/details/0x200C/index.html) and [`ZERO WIDTH SPACE`](http://www.unicodemap.org/details/0x200B/index.html) character interspersed between each number group (code 8204 and 8203). This explains the difference. Please take a look at the code that you use to build the strings with and check if those are present anywhere. For instance, that `"x2"` format string, does it contain extra characters? Note that those two characters are zero-width, aka not visible. – Lasse V. Karlsen Nov 07 '16 at 10:00
  • Character 8204 is [Zero Width Non Joiner](http://www.fileformat.info/info/unicode/char/200c/index.htm) and 8203 is [Zero Width Space](http://www.fileformat.info/info/unicode/char/200B/index.htm). Sanitise your inputs. – Matthew Watson Nov 07 '16 at 10:01

1 Answers1

1

Once you showed the character for character output of the longer string the explanation is clear.

As to why this happens, that's pretty impossible to tell from our end due to the nature of the problem.

Anyway, the problem are these two:

==8204
==8203

Those two code points are 0x200C and 0x200B aka:

These are invisible characters meant to give hints to word-breaking algorithms and similar gory stuff.

Simply put, somewhere in your code where you concatenate strings you have those two characters as part of your source code. Since they're not visible in your source code either (zero width, remember) they can be hard to spot.

I would take a look at all strings involved in thise, in particular I would starte with the "x2" format string used to build up the hash code, or possibly the code that returns the MD5 code for the update to apply.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • What is the correct code to use in this code `updateMD5 = updateMD5.Replace("\0", "");` base on Matthew Watson answer. – Vic Nov 08 '16 at 00:37
  • I Already rid the ZERO WITH NON-JOINER and ZERO WIDTH SPACE with this code `string updateMD5 =(((string[])e.Argument)[1].ToUpper().Replace("\u200B", "")).Replace("\u200C","");` – Vic Nov 08 '16 at 01:12