121

Is there a way to calculate the MD5 hash of a file before the upload to the server using Javascript?

LuRsT
  • 3,973
  • 9
  • 39
  • 51
  • 2
    Strongly Related: [How to generate checksum & convert to 64 bit in Javascript for very large files without overflowing RAM? ](https://stackoverflow.com/q/51987434/514235) – iammilind Aug 23 '18 at 14:10

13 Answers13

107

While there are JS implementations of the MD5 algorithm, older browsers are generally unable to read files from the local filesystem.

I wrote that in 2009. So what about new browsers?

With a browser that supports the FileAPI, you can read the contents of a file - the user has to have selected it, either with an <input> element or drag-and-drop. As of Jan 2013, here's how the major browsers stack up:

How?

See the answer below by Benny Neugebauer which uses the MD5 function of CryptoJS

Paul Dixon
  • 295,876
  • 54
  • 310
  • 348
  • 33
    Apart from the impossibility to get file system access in JS, I would not put any trust at all in a client-generated checksum. So generating the checksum on the server is mandatory in any case. – Tomalak Apr 20 '09 at 14:05
  • 7
    @Tomalak It's also mandatory to do it on the client if you only want to upload it if it's different that what you already have. – John Feb 26 '16 at 17:41
  • 3
    @John Well, my statement does not rule this out. Client-side checks strictly are for user convenience (and thus more or less optional, depending on how convenient you want to make it). Server-side checks on the other hand are mandatory. – Tomalak Feb 26 '16 at 17:45
  • md5 function in http://pajhome.org.uk/crypt/md5/ does not support binary as the input? I think it's necessary to calculate the binary stream for an uploaded image in browser. Thank you. – jiajianrong Dec 20 '18 at 09:04
  • 1
    If you can, add some example code to your answer. It would help a lot. – cbdeveloper Jun 06 '19 at 17:16
  • I flagged this as `not an answer` because it says that it is possible and gives a link to a website, but it does not actually provide any js related to the MD5 algorithm. I recommend the CryptoJS algorithm as suggested by Benny Neugebauer. – AksLolCoding Jul 23 '21 at 13:35
34

it is pretty easy to calculate the MD5 hash using the MD5 function of CryptoJS and the HTML5 FileReader API. The following code snippet shows how you can read the binary data and calculate the MD5 hash from an image that has been dragged into your Browser:

var holder = document.getElementById('holder');

holder.ondragover = function() {
  return false;
};

holder.ondragend = function() {
  return false;
};

holder.ondrop = function(event) {
  event.preventDefault();

  var file = event.dataTransfer.files[0];
  var reader = new FileReader();

  reader.onload = function(event) {
    var binary = event.target.result;
    var md5 = CryptoJS.MD5(binary).toString();
    console.log(md5);
  };

  reader.readAsBinaryString(file);
};

I recommend to add some CSS to see the Drag & Drop area:

#holder {
  border: 10px dashed #ccc;
  width: 300px;
  height: 300px;
}

#holder.hover {
  border: 10px dashed #333;
}

More about the Drag & Drop functionality can be found here: File API & FileReader

I tested the sample in Google Chrome Version 32.

Benny Code
  • 51,456
  • 28
  • 233
  • 198
  • 2
    The problem is, that `readAsBinaryString()` has not been standardized and is not supported by Internet Explorer. I didn't tested it in Edge, but even IE11 does not support it. – StanE Dec 03 '16 at 02:18
  • @user25163 Internet Explorer (and Opera Mini) seem to be the only modern browsers not supporting `readAsBinaryString()`: http://caniuse.com/#feat=filereader — Microsoft Edge supports it. – Benny Code Dec 04 '16 at 08:46
  • Thank for the info regarding MS Edge! I work for a company. And you know, that customers often use old software and how hard it is to convince them to update their software. I just wanted to point out, that one has to be careful using `readAsBinaryString()` as it is not supported by older browsers. An alternative I found is SparkMD5. It uses the FileReader API too but the method `readAsArrayBuffer`, which is supported by IE. And it can handle huge files by reading them in chunks. – StanE Dec 04 '16 at 13:04
  • 2
    CryptoJS now supports converting from an ArrayBuffer to Binary/WordArray via: `CryptoJS.lib.WordArray.create(arrayBuffer);` – Warren Parad Jan 08 '18 at 20:43
  • @WarrenParad And how would the above code then be modified to work with ArrayBuffer? Ahh, found it here: https://stackoverflow.com/questions/28437181/md5-hash-of-a-file-using-javascript#28458081 – TheStoryCoder Mar 27 '18 at 13:48
  • This is the wrong answer. CryptoJS.MD5 would treat any input string as UTF-8 encoded string, and then convert it to WordArray internally. The answer erroneously uses the binary string read by FileReader as input to the CryptoJS.MD5 function. Certainly you get a wrong hash, unless the file only contains ASCII text. – 張俊芝 Feb 24 '21 at 14:08
  • CryptoJS is not defined – Phil Dec 06 '22 at 17:54
  • @Phil did you embed the CryptoJS JavaScript file in your HTML? https://github.com/sytelus/CryptoJS/blob/v3.1.2/rollups/md5.js – Benny Code Dec 14 '22 at 15:58
34

I've made a library that implements incremental md5 in order to hash large files efficiently. Basically you read a file in chunks (to keep memory low) and hash it incrementally. You got basic usage and examples in the readme.

Be aware that you need HTML5 FileAPI, so be sure to check for it. There is a full example in the test folder.

https://github.com/satazor/SparkMD5

satazor
  • 619
  • 6
  • 6
  • @Biswa here is my implementation. https://gist.github.com/marlocorridor/3e6484ae5a646bd7c625 – marlo Sep 03 '15 at 04:41
  • 1
    Hey this works great! I tried CryptoJS and never could get an accurate MD5 out of it for some reason, this works like a charm! Any plans for sha256? @satazor – cameck Jul 12 '17 at 19:42
  • @cameck, the library is good. However I tried it today and it seems that there is an issue with the `.end()` method. If you call this method again then it gives wrong result the next times. Because `.end()` calls `.reset()` internally. This is a coding disaster and not good for library writing. – iammilind Aug 24 '18 at 08:45
  • Thanks for the library! Put together a minimal code: https://dev.to/micmo/compute-md5-checksum-for-a-file-in-typescript-59a4 – Qortex Jan 18 '20 at 10:20
22

The following snippet shows an example, which can archive a throughput of 400 MB/s while reading and hashing the file.

It is using a library called hash-wasm, which is based on WebAssembly and calculates the hash faster than js-only libraries. As of 2020, all modern browsers support WebAssembly.

const chunkSize = 64 * 1024 * 1024;
const fileReader = new FileReader();
let hasher = null;

function hashChunk(chunk) {
  return new Promise((resolve, reject) => {
    fileReader.onload = async(e) => {
      const view = new Uint8Array(e.target.result);
      hasher.update(view);
      resolve();
    };

    fileReader.readAsArrayBuffer(chunk);
  });
}

const readFile = async(file) => {
  if (hasher) {
    hasher.init();
  } else {
    hasher = await hashwasm.createMD5();
  }

  const chunkNumber = Math.floor(file.size / chunkSize);

  for (let i = 0; i <= chunkNumber; i++) {
    const chunk = file.slice(
      chunkSize * i,
      Math.min(chunkSize * (i + 1), file.size)
    );
    await hashChunk(chunk);
  }

  const hash = hasher.digest();
  return Promise.resolve(hash);
};

const fileSelector = document.getElementById("file-input");
const resultElement = document.getElementById("result");

fileSelector.addEventListener("change", async(event) => {
  const file = event.target.files[0];

  resultElement.innerHTML = "Loading...";
  const start = Date.now();
  const hash = await readFile(file);
  const end = Date.now();
  const duration = end - start;
  const fileSizeMB = file.size / 1024 / 1024;
  const throughput = fileSizeMB / (duration / 1000);
  resultElement.innerHTML = `
    Hash: ${hash}<br>
    Duration: ${duration} ms<br>
    Throughput: ${throughput.toFixed(2)} MB/s
  `;
});
<script src="https://cdn.jsdelivr.net/npm/hash-wasm"></script>
<!-- defines the global `hashwasm` variable -->

<input type="file" id="file-input">
<div id="result"></div>
Biró Dani
  • 444
  • 1
  • 6
  • 7
15

HTML5 + spark-md5 and Q

Assuming your'e using a modern browser (that supports HTML5 File API), here's how you calculate the MD5 Hash of a large file (it will calculate the hash on variable chunks)

function calculateMD5Hash(file, bufferSize) {
  var def = Q.defer();

  var fileReader = new FileReader();
  var fileSlicer = File.prototype.slice || File.prototype.mozSlice || File.prototype.webkitSlice;
  var hashAlgorithm = new SparkMD5();
  var totalParts = Math.ceil(file.size / bufferSize);
  var currentPart = 0;
  var startTime = new Date().getTime();

  fileReader.onload = function(e) {
    currentPart += 1;

    def.notify({
      currentPart: currentPart,
      totalParts: totalParts
    });

    var buffer = e.target.result;
    hashAlgorithm.appendBinary(buffer);

    if (currentPart < totalParts) {
      processNextPart();
      return;
    }

    def.resolve({
      hashResult: hashAlgorithm.end(),
      duration: new Date().getTime() - startTime
    });
  };

  fileReader.onerror = function(e) {
    def.reject(e);
  };

  function processNextPart() {
    var start = currentPart * bufferSize;
    var end = Math.min(start + bufferSize, file.size);
    fileReader.readAsBinaryString(fileSlicer.call(file, start, end));
  }

  processNextPart();
  return def.promise;
}

function calculate() {

  var input = document.getElementById('file');
  if (!input.files.length) {
    return;
  }

  var file = input.files[0];
  var bufferSize = Math.pow(1024, 2) * 10; // 10MB

  calculateMD5Hash(file, bufferSize).then(
    function(result) {
      // Success
      console.log(result);
    },
    function(err) {
      // There was an error,
    },
    function(progress) {
      // We get notified of the progress as it is executed
      console.log(progress.currentPart, 'of', progress.totalParts, 'Total bytes:', progress.currentPart * bufferSize, 'of', progress.totalParts * bufferSize);
    });
}
<script src="https://cdnjs.cloudflare.com/ajax/libs/q.js/1.4.1/q.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/spark-md5/2.0.2/spark-md5.min.js"></script>

<div>
  <input type="file" id="file"/>
  <input type="button" onclick="calculate();" value="Calculate" class="btn primary" />
</div>
Community
  • 1
  • 1
Jossef Harush Kadouri
  • 32,361
  • 10
  • 130
  • 129
  • 1
    Checkout the repo(https://github.com/satazor/js-spark-md5) which contains sample code to calculate a file md5. – kiatng Jan 16 '22 at 07:57
8

You need to to use FileAPI. It is available in the latest FF & Chrome, but not IE9. Grab any md5 JS implementation suggested above. I've tried this and abandoned it because JS was too slow (minutes on large image files). Might revisit it if someone rewrites MD5 using typed arrays.

Code would look something like this:

HTML:     
<input type="file" id="file-dialog" multiple="true" accept="image/*">

JS (w JQuery)

$("#file-dialog").change(function() {
  handleFiles(this.files);
});

function handleFiles(files) {
    for (var i=0; i<files.length; i++) {
        var reader = new FileReader();
        reader.onload = function() {
        var md5 = binl_md5(reader.result, reader.result.length);
            console.log("MD5 is " + md5);
        };
        reader.onerror = function() {
            console.error("Could not read the file");
        };
        reader.readAsBinaryString(files.item(i));
     }
 }
Aleksandar Totic
  • 2,557
  • 25
  • 27
  • Webtoolkit MD5 pointed by bendewey performed much better, 16s for a multi-MB file: http://www.webtoolkit.info/javascript-md5.html – Aleksandar Totic Apr 20 '11 at 03:41
  • 1
    I've managed to get this working and the same md5 hash is generating (php: md5_file(...)) for text files but images are giving me different results? Is this something to do with the binary data or the way its uploaded? – Castles Apr 26 '11 at 15:16
  • I'm pretty sure this code doesn't work with multiple files, because onload is a callback, the `reader` variable will be the last file by the time the onload functions are ran. – Dave Nov 08 '13 at 20:31
  • CryptoJS now supports converting from an ArrayBuffer to Binary/WordArray via: `CryptoJS.lib.WordArray.create(arrayBuffer);` – Warren Parad Jan 08 '18 at 20:43
7

Apart from the impossibility to get file system access in JS, I would not put any trust at all in a client-generated checksum. So generating the checksum on the server is mandatory in any case. – Tomalak Apr 20 '09 at 14:05

Which is useless in most cases. You want the MD5 computed at client side, so that you can compare it with the code recomputed at server side and conclude the upload went wrong if they differ. I have needed to do that in applications working with large files of scientific data, where receiving uncorrupted files were key. My cases was simple, cause users had the MD5 already computed from their data analysis tools, so I just needed to ask it to them with a text field.

Marco
  • 71
  • 2
  • 2
5

If sha256 is also fine:

  async sha256(file: File) {
    // get byte array of file
    let buffer = await file.arrayBuffer();

    // hash the message
    const hashBuffer = await crypto.subtle.digest('SHA-256', buffer);

    // convert ArrayBuffer to Array
    const hashArray = Array.from(new Uint8Array(hashBuffer));

    // convert bytes to hex string
    const hashHex = hashArray.map(b => b.toString(16).padStart(2, '0')).join('');
    return hashHex;
  }
wutzebaer
  • 14,365
  • 19
  • 99
  • 170
3

To get the hash of files, there are a lot of options. Normally the problem is that it's really slow to get the hash of big files.

I created a little library that get the hash of files, with the 64kb of the start of the file and the 64kb of the end of it.

Live example: http://marcu87.github.com/hashme/ and library: https://github.com/marcu87/hashme

Marco Antonio
  • 339
  • 2
  • 10
3

hope you have found a good solution by now. If not, the solution below is an ES6 promise implementation based on js-spark-md5

import SparkMD5 from 'spark-md5';

// Read in chunks of 2MB
const CHUCK_SIZE = 2097152;

/**
 * Incrementally calculate checksum of a given file based on MD5 algorithm
 */
export const checksum = (file) =>
  new Promise((resolve, reject) => {
    let currentChunk = 0;
    const chunks = Math.ceil(file.size / CHUCK_SIZE);
    const blobSlice =
      File.prototype.slice ||
      File.prototype.mozSlice ||
      File.prototype.webkitSlice;
    const spark = new SparkMD5.ArrayBuffer();
    const fileReader = new FileReader();

    const loadNext = () => {
      const start = currentChunk * CHUCK_SIZE;
      const end =
        start + CHUCK_SIZE >= file.size ? file.size : start + CHUCK_SIZE;

      // Selectively read the file and only store part of it in memory.
      // This allows client-side applications to process huge files without the need for huge memory
      fileReader.readAsArrayBuffer(blobSlice.call(file, start, end));
    };

    fileReader.onload = e => {
      spark.append(e.target.result);
      currentChunk++;

      if (currentChunk < chunks) loadNext();
      else resolve(spark.end());
    };

    fileReader.onerror = () => {
      return reject('Calculating file checksum failed');
    };

    loadNext();
  });
Zico Deng
  • 645
  • 5
  • 14
2

There is a couple scripts out there on the internet to create an MD5 Hash.

The one from webtoolkit is good, http://www.webtoolkit.info/javascript-md5.html

Although, I don't believe it will have access to the local filesystem as that access is limited.

bendewey
  • 39,709
  • 13
  • 100
  • 125
0

This is another hash-wasm example, but using the streams API, instead of having to set FileReader:

async function calculateSHA1(file: File) {
  const hasher = await createSHA1()

  const hasherStream = new WritableStream<Uint8Array>({
    start: () => {
      hasher.init()
      // you can set UI state here also
    },
    write: chunk => {
      hasher.update(chunk)
      // you can set UI state here also
    },
    close: () => {
      // you can set UI state here also
    },
  })

  await file.stream().pipeTo(hasherStream)

  return hasher.digest('hex')
}
Danielle Madeley
  • 2,616
  • 1
  • 19
  • 26
-4

I don't believe there is a way in javascript to access the contents of a file upload. So you therefore cannot look at the file contents to generate an MD5 sum.

You can however send the file to the server, which can then send an MD5 sum back or send the file contents back .. but that's a lot of work and probably not worthwhile for your purposes.

kbosak
  • 2,132
  • 1
  • 13
  • 16