0

I would like to know if there is a ways of telling that two files are the same ?

I am using a solution but it appeared that it is not very effective, i download the first part of each one than i convert the data received into base64 and finally i compare between them.

but i face a problem when for example the first half of both files (a.html and b.html) are the same, the signature that is generated is the same even if the last part is different. the code i use to download a preview of the file

https.get(url, function(res) {
        var chunks = [];
        if (res.statusCode !== 200) {
            responce.jsonp(404, null);
        }
        res.on('data', function(chunk) {

            chunks.push(chunk);
            var jsfile = new Buffer.concat(chunks).toString('base64');
            jsfile = jsfile.substring(0, 100);
            responce.header('Access-Control-Allow-Origin', '*');
            responce.header('Access-Control-Allow-Headers', 'X-Requested-With');
            responce.header('content-type', 'application/pdf');
            responce.send(200, jsfile);
        });
user2422940
  • 917
  • 3
  • 12
  • 23
  • Can you post the code? sounds like you have a bug. If you indeed base64'd ALL of the file data and then compared the entire base64 strings, they would certainly be different. – jeremy Apr 25 '14 at 13:39
  • This is just a snippet of the code, but you aren't waiting until you've received the entire response before you send data back. res.on('data'...) will be called multiple times and you need to wait until res.on('end'... to know that all the data is in. – jeremy Apr 25 '14 at 13:52
  • i used this method to minimize the time of waiting if i have a big file (50MB for example) and i wait until it complete this would take 5 min at least – user2422940 Apr 25 '14 at 14:12
  • How could you tell if they are different if you don't have the whole file to compare? You can also look at the headers to check the etag, but its not 100% reliable. Also, if the files are in s3 or similar services the HEAD method of http may give you an md5 of the file – jeremy Apr 25 '14 at 15:33

1 Answers1

1

I think you should use md5 hash for files compare. Check this out : node.js hash string?

Community
  • 1
  • 1
Le Trong
  • 144
  • 1
  • 7
  • I don't think you should be recommending MD5 in 2014. – Frédéric Hamidi Apr 25 '14 at 13:38
  • the problem is not the hash but the fact that i have to download the entire file in order to tell if they are the same or not.for example if i have a pdf file who's size is 50mb i have to wait a long time until the download finish. – user2422940 Apr 25 '14 at 13:39
  • That is an unavoidable problem. I would HIGHLY suggest the hashing method if your looking at 50mb files as well since you would need to store both copys in memory to compare base64 – jeremy Apr 25 '14 at 13:40