1

I'm pushing the limits of HTML5 here.

I have this problem: I have an javascript array of a billion doubles (or ints), anyways A LOT of numbers. I want to store this in the HTML5 localStorage.

You may say, hey, just use JSON.Stringify, BUT, JSON.Stringify produces a huge 200MB string. Because, a number (0.03910319 for example), is stored as a string (so each number is taking up some bytes instead of just a few bytes for the whole number).

I was thinking about base64 encoding the numbers in the array first, and then applying JSON.stringify?

Or is it for example better to JSON.Stringify and then GZip or use some compression function?

Come up with your creative ideas to encode/decode an javascript array of A BILLION ints/doubles in an efficient matter to a localStorage variable.

TensorFlowJS

I looked at TensorflowJS, my array is basically a 1-D Tensor. Tensorflow has some storage capabilities for models... Maybe that is a feasible solution.

Vinzentz
  • 73
  • 6
  • 2
    Use compression, and then if the result is still too big (5MB size limit IIRC), use IndexedDB instead of LocalStorage, though it may be too big for IndexedDB as well. (Or, this doesn't sound like the sort of thing client-side storage should be used for at all) – CertainPerformance Dec 09 '18 at 00:29
  • @CertainPerformance What sort of compression would you advice? would GZIP be good for this? – Vinzentz Dec 09 '18 at 00:34
  • 1
    Take a look at [lz-string](http://pieroxy.net/blog/pages/lz-string/index.html). – Jeto Dec 09 '18 at 00:34
  • Not my specialty, try various options and see how they work for you – CertainPerformance Dec 09 '18 at 00:34
  • @Jeto lz-string looks promising! – Vinzentz Dec 09 '18 at 00:35
  • Is there another approach to this? Can you break up the problem-space into smaller blocks? – Kingsley Dec 09 '18 at 00:50
  • lz-string is good but i think, still insufficient. from a string with 2.200.000 comma separated doubles (450.000.000 characters), it produces a string with 75.000.000 characters, it compressed pretty good, (but took a looong time, lz-string is meant for <1.000.000 characters) but it should be possible to do better, by first compressing the doubles to a string utilizing the whole UTF16 space maybe? – Vinzentz Dec 09 '18 at 01:00
  • You can use some hack, like a mock file hack: have your dataset compressed and stored in some file, and let client download it once so it can be gathered from cache. To avoid further server requests you can use a service worker. Btw, take a look at this - https://stackoverflow.com/questions/29166465/i-wish-to-store-large-amounts-of-data-client-side-in-chrome-what-are-my-options and this https://stackoverflow.com/questions/8630609/compressing-floating-point-data – lucifer63 Dec 09 '18 at 01:23
  • @lucifer63 Love that creative thinking! Definitely a trick to look into, for my current project this mock file hack is not an option because the nature of the data doesn't allow me to send it off the end-users computer. Those other stackoverflow questions look interesting, I also came across this: https://medium.com/samsung-internet-dev/being-fast-and-light-using-binary-data-to-optimise-libraries-on-the-client-and-the-server-5709f06ef105 Using arrayBuffers and a special float32array type in javascript (typed arrays)? – Vinzentz Dec 10 '18 at 02:18

1 Answers1

0

For anyone who is also dealing with this problem:

I used a Float32Array (javascript typed array) for my data.

A Float32Array is easily stored in IndexedDB using https://github.com/localForage/localForage

Vinzentz
  • 73
  • 6