31

I'm writing a web app that generates a potentially large text file that the user will download, and all the processing is done in the browser. So far I'm able to read a file over 1 GB in small chunks, process each chunk, generate a large output file incrementally, and store the growing output in IndexedDB. My more naïve attempt which kept all the results in memory and then serialized them to a file at the very end was causing all browsers to crash.

My question is two-fold:

  1. Can I append to an entry in IndexedDB (either a string or an array) without reading the whole thing into memory first? Right now, this:

    task.dbInputWriteQueue.push(output);
    var transaction = db.transaction("files", "readwrite");
    var objectStore = transaction.objectStore("files");
    var request = objectStore.get(file.id);
    request.onsuccess = function()
    {
        request.results += nextPartOfOutput
        objectStore.put(request.results);
    };
    

    is causing crashes after the output starts to get big. I could just write a bunch of small entries into the database, but then I'd have to read them all in to memory later anyway to concatenate them. See part 2 of my question...

  2. Can I make a data object URL to reference a value in IndexedDB without loading that value into memory? For small strings I can do:

    var url = window.URL.createObjectURL(new Blob([myString]), {type: 'text/plain'});
    

    But for large strings this doesn't jive too well. In fact, it crashes before the string is loaded. It seems that big reads using get() from IndexedDB cause Chrome, at least, to crash (even the developer tools crash).

Would it be faster if I was using Blobs instead of strings? Is that conversion cheap?

Basically I need a way, with JavaScript, to write a really big file to disk without loading the whole thing into memory at any one point. I know that you can give createObjectURL a File, but that doesn't work in my case since I'm generating a new file from one the user provides.

Matt
  • 22,721
  • 17
  • 71
  • 112
  • 2
    Is using the file system API (which was deprecated before it was releated) out of the question? – Josh Nov 22 '14 at 21:23
  • Yeah, browser support for writing files using the file system api is abysmal, and our customers expect a file to download. – Matt Nov 22 '14 at 23:16
  • What kind of file is this ? – n00dl3 Dec 17 '14 at 09:42
  • @JuniusRendel In my case it's a delimited text file which the customer can import into a spreadsheet or a database... – Matt Dec 17 '14 at 15:19
  • Why not give a number to your chunk (maybe the number of the line if it's CSV like) and use that ID as key for the DB ? – Denys Séguret Dec 17 '14 at 21:23
  • @dystroy Storing the output, even in pieces, in the database is not a problem. That would certainly work. However, when the Blob is created to produce the file, all the chunks must be concatenated in memory -- as far as I know -- and my question is how to do that without loading the whole thing into memory, if possible. If not, maybe we need to wait for browser technologies to mature a little more. – Matt Dec 17 '14 at 22:12
  • 4
    Hi Matt, I'm afraid this is possible using only the file system API on chrome, I've implemented such solution and I was able to create files up to 4GB, other solution would be to send the data to the server and create the file there. Until they add a write function to the File API this won't be possible in the browser. – Deni Spasovski Dec 18 '14 at 00:45
  • 2
    As @DeniSpasovski suggests, attempt to "send the data to the server and create the file there". Also, post it, rather than use `GET` as `POST` allows sending more data than `GET`. – Agi Hammerthief Dec 18 '14 at 07:59
  • 1
    One other direction which might be an option if your target is only desktop browsers - using swf to generate files - http://stackoverflow.com/questions/8150516/javascript-or-flash-export-to-csv-excel – Deni Spasovski Dec 18 '14 at 15:22
  • 3
    @AgiHammerthief Yeah .. I'm gonna wait for that 1-4Gb upload ; – user2864740 Dec 21 '14 at 23:01
  • 1
    Similar question: http://stackoverflow.com/questions/20623615/huge-javascript-html5-blob-from-large-arraybuffers-to-build-a-giant-file-in-cl How about compressing file in browser so it won't exceed user's memory size? – Ginden Dec 22 '14 at 11:49
  • 1
    I know it is very bad way of implementation, but you can give user ability to download file in parts (2 to 4 parts). And then he must download one script file that execute on their machine to concatenate file. – Abhijeet K Dec 22 '14 at 14:55
  • Maybe this http://stackoverflow.com/questions/20623615/huge-javascript-html5-blob-from-large-arraybuffers-to-build-a-giant-file-in-cl?lq=1 will help for you first question – Adam Cherti Dec 22 '14 at 17:06

2 Answers2

8

Storing a Blob will use a lot less space and resources as there is no longer a need for conversion to base64. You can even store "text/plain" objects as blobs:

var blob = new Blob(['blob object'], {type: 'text/plain'});
var store = db.transaction(['entries'], 'readwrite').objectStore('entries');

// Store the object  
var req = store.put(blob, 'blob');
req.onerror = function(e) {
    console.log(e);
};
req.onsuccess = function(event) {
    console.log('Successfully stored a blob as Blob.');
};

You can see more info here: https://hacks.mozilla.org/2012/02/storing-images-and-files-in-indexeddb/

Chrome has supported this only since summer of 2014: http://updates.html5rocks.com/2014/07/Blob-support-for-IndexedDB-landed-on-Chrome-Dev so you cannot use this on older versions of Chrome.

Don Rhummy
  • 24,730
  • 42
  • 175
  • 330
0

I just reopened the Chrome bug which I submitted 2 years ago and created another bug for the FF team, related to the browser crash when creating a large blob. Generating large files shouldn't be a issue for the browsers.

Deni Spasovski
  • 4,004
  • 3
  • 35
  • 52