5

I'm trying use a secure way to create checksum for files (Larger than 10GB !).

SHA256 is secure enough for me but this algorithm is so process expensive and it is not suitable. Well I know that both SHA1 and MD5 checksums are insecure through the collisions.

So I just think the fastest and the safest way is combining MD5 with SHA1 like : SHA1+MD5 and I don't think there is way to get file (Collision) with the same MD5 and SHA1 both at a same time .

So is combining SHA1+MD5 secure enough for file checksum? or is there any attack like collision for it ?

I use c# mono in two way (Bufferstream and without Bufferedstream)

    public static string GetChecksum(string file)
    {
        using (FileStream stream = File.OpenRead(file))
        {
            var sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(stream);
            return BitConverter.ToString(checksum).Replace("-", String.Empty);
        }
    }

    public static string GetChecksumBuffered(Stream stream)
    {
        using (var bufferedStream = new BufferedStream(stream, 1024 * 32))
        {
            var sha = new SHA256Managed();
            byte[] checksum = sha.ComputeHash(bufferedStream);
            return BitConverter.ToString(checksum).Replace("-", String.Empty);
        }
    }

Update 1: I mean SHA1 hash + MD5 hash. First calculate SHA1 of file then calculate MD5 of file then add this two sting together.

Update 2 :

As @zaph mentioned I implement my code(C# MONO) again according what I read here but it doesn't make my code as fast as he said ! It makes my speed for a 4.6 GB file from (approximate) 12mins to about 8.~ mins but sha1+md5 takes me less than 100 secs for this file. So I still think it isn't right to use SHA256 instead.

Community
  • 1
  • 1
Mohammad Sina Karvandi
  • 1,064
  • 3
  • 25
  • 44
  • @zaph well what do you use ! I test what you said with the default implementation of Mono(C#) in a Linux mint - For a 4 GB file it takes about 12 mins - For 108 MB file it takes 19 seconds .... So what's wrong !!! I use a core i 7 Intel cpu – Mohammad Sina Karvandi Jul 19 '16 at 16:31
  • You must have a slow implementation, my cpu is a 2010 2.8 GHz Quad-Core Intel Xeon. My 2011 laptop is faster. Most Intel processors have instruction that can be used to make crypto operations faster. – zaph Jul 19 '16 at 16:42
  • Are you doing this in real time? – Erik Philips Jul 19 '16 at 17:09
  • @ErikPhilips I was tested this with a mobile phone chronometer before. I read sth about my problem here : http://stackoverflow.com/questions/1177607/what-is-the-fastest-way-to-create-a-checksum-for-large-files-in-c-sharp ... But after 8 mins nothing specially happened , and I'm waiting now ! – Mohammad Sina Karvandi Jul 19 '16 at 17:13
  • 1
    @ᔕIᑎᗩKᗩᖇᐯᗩᑎᗪI If you don't need it realtime, then why the concern for speed? – Erik Philips Jul 19 '16 at 17:50
  • @ErikPhilips well, 4 GB takes about (approximate) 8-12 mins but combination of sha1+md5 just takes less than 100 secs. I can't let computer perform this silly thing when I can perform it less than 100 secs. Another benefit for me is combination of sha1+md5 is smaller in size which made my database more smaller. Doesn't it ?! I want to perform this action more than 1000 times a day ! – Mohammad Sina Karvandi Jul 19 '16 at 17:55
  • @ᔕIᑎᗩKᗩᖇᐯᗩᑎᗪI yes a 1,000 a times a day is a good reason to change algorithms, but I can't read minds, so I didn't know the context of it's usage. As for a smaller database, hard drive space is cheap and not a good reason to switch. But 1,000 times a day IS a good reason to switch. – Erik Philips Jul 19 '16 at 18:06
  • You might try a different SHA-256 library, even the SHA1+MD5 times should be faster. Is there a native cryptpo-hash library for Linux mint? – zaph Jul 19 '16 at 18:40
  • @zaph I think you're right somehow bcuz I test my file with md5sum and it takes just 18 secs while my implementation of MD5 in mono takes about 53 secs. – Mohammad Sina Karvandi Jul 19 '16 at 18:43
  • Have you tried [`sha256sum`](http://linux.die.net/man/1/sha256sum)? BTW, cryptographic hashes are very different from checksums (yeah I realize some documentation confuses the difference). – zaph Jul 19 '16 at 18:48
  • @zaph I'm trying to test it now. By the way I thought md5sum and what I am doing with the above code is the same ! Aren't they ?! They both give me same result what is different? Would you pls give me some documentation, if is possible. – Mohammad Sina Karvandi Jul 19 '16 at 18:55
  • I'm somewhat lost on the comparison, *[md5sum](https://en.wikipedia.org/wiki/Md5sum) is a computer program that calculates 128-bit MD5 hashes*. SHS-256 calculates a 256-bit hashes. – zaph Jul 19 '16 at 19:03

2 Answers2

2

There should be only a small difference between SHA-256 and a combination of MD5+SHA1.

The only way to know is to benchmark:

On my desk top:
SHA-256: 200 MB/s
MD5: 470 MB/s
SHA1: 500 MB/s (updated, previously incorrect)
MD5+SHA1 240 MB/s

These times are only for the hashing, disk read time is not included. The tests were done with a 1MB buffer and averaged over 10 runs. The language was "C" and the library used was Apple's Common Crypto. The cpu was a 2.8 GHz Quad-Core Intel Xeon (2010 MacPro, my laptop is faster).

In the end it is 23% faster to use the combined MD5+SHA1.

Note: Most Intel processors have instruction that can be used to make crypto operations faster. Not all implementations utilize these instructions.

YOumight try a native implementations such as sha256sum.

zaph
  • 111,848
  • 21
  • 189
  • 228
  • I test it again and again with BufferedStream in c# but there is little diffrent for my 4 GB file.It finished after 8 mins now. I think you're wrong bcuz I think (Do not sure) the sha256 must be slower than sha1 but in your example sha256 is faster ! Would you pls share what language and implementation that you use that gives 190 MB/s. – Mohammad Sina Karvandi Jul 19 '16 at 17:48
1

If by SHA1+MD5 you mean hashing with SHA-1 first and then using that digest at input into MD5, then you are not eliminating collisions completely, just potentially reducing the chance of one occurring.

Both SHA-1 and MD5 are fixed length cryptographic hash functions, and according to the Pigeonhole Principle collisions are bound to occur if the message length is greater than the digest size. There are two instances of this in your use case:

  • When you hash your arbitrary-length message with SHA-1
  • When the 160-bit SHA-1 digest is used as input to MD5

My point is that collisions will always exist. However, the probability of finding one is exceedingly small. If the sole purpose is for file integrity, SHA-1 will do the job just fine on its own.

Related:

What checksum algorithm should I use?

Is MD5 still good enough to uniquely identify files?

Community
  • 1
  • 1
Mingky
  • 31
  • 4
  • OMG ! No I just mean sha1 hash + md5 hash . Contain first calculate sha1 of file then calculate md5 of file then add this two sting together. – Mohammad Sina Karvandi Jul 19 '16 at 15:11
  • If it's simple concatenation of the two digests then the total collision-resistance will be that of the cryptographically stronger hash function, in this case SHA-1. – Mingky Jul 19 '16 at 15:20
  • 1
    @Mingky (as far as I can tell) this is not true because you have to find a source that results in a collision for both hashes. Even a CRC added to MD5 would make it more difficult to find a collision. – wischi Jul 20 '16 at 07:54