1

I am implementing a code to produce a checksum from a string. I would just like to know the following below:

Why is the checksum produced directly from a string different from the checksum produced from a file containing the same string but was manually copied to the file using ctrl+c?

Edit: I'm not asking for the implementation. I'm asking why are they different to those who may have encountered this

Another example would be, why is the checksum produced from a file created by code different from the checksum produced from a file created manually where the string was copy-pasted?

But when I compared the two strings using a tool like WinMerge, it gives me the two identical strings.

Any enlightening answers are appreciated

yologaming
  • 117
  • 2
  • 10
  • 6
    please show us some code, otherwise this will be really hard to solve. – DigitalJedi Jun 06 '19 at 09:49
  • 1
    You can use produce md5 or sha signature. See for instance https://stackoverflow.com/questions/304268/getting-a-files-md5-checksum-in-java – Mihai8 Jun 06 '19 at 09:53
  • 3
    OK, so heres the thing: depending ON YOUR IMPLEMENTATION, the checksum can vary gratly, just think about having a "\n" as whitespace in your file, or an EOF (end of file) character, that you include in your checksum calculation by mistake, the string does maybe not have these - and therefor the checksum will be different – DigitalJedi Jun 06 '19 at 09:54
  • 2
    Or the encoding of the characters is differently between the string and the file (I think this is very likely) – kutschkem Jun 06 '19 at 09:57
  • Hi I edited my post. See if any makes a difference to your current answers – yologaming Jun 06 '19 at 10:01
  • To check the line endings, check the file sizes of WinMerged "identical" files. – Joop Eggen Jun 06 '19 at 10:17
  • @DigitalJedi could you even detect an EOF and if there is, could you remove it? – yologaming Jun 06 '19 at 12:30
  • It is likely that the difference you see is 100% based on the encoding of the data. In Java, Strings are UTF-16 in memory. If the file in question is not in UTR-16 then the CRC will differ because the data will be different. – DwB Jun 06 '19 at 13:02

0 Answers0