2

In writing a database to disk as a text file of JSON strings, I've been experimenting with how to most efficiently build the string of text that is ultimately converted to a blob for download to disk.

There a number of questions that state to not concatenate a string with the + operator in a loop, but instead write the component strings to an array and then use the join method to build one large string.

The best explanation I came across explaining why can be found here, by Jeol Mueller:

In JavaScript (and C# for that matter) strings are immutable. They can never be changed, only replaced with other strings. You're probably aware that combined + "hello " doesn't directly modify the combined variable - the operation creates a new string that is the result of concatenating the two strings together, but you must then assign that new string to the combined variable if you want it to be changed.

So what this loop is doing is creating a million different string objects, and throwing away 999,999 of them. Creating that many strings that are continually growing in size is not fast, and now the garbage collector has a lot of work to do to clean up after this."

The thread here, was also helpful.

However, using the join method didn't allow me to build the string I was aiming for without getting the error:

allocation size overflow

I was trying to write 50,000 JSON strings from a database into one text file, which simply may have been too large no matter what. I think it was reaching over 350MB. I was just testing the limit of my application and picked something far larger than a user of the application will likely ever create. So, this test case was likely unreasonable.

Nonetheless, this leaves me with three questions about working with large strings.


  1. For the same amount of data overall, does altering the number of array elements joined in a single join operation affect the efficiency in terms of not hitting an allocation size overflow?

    For example, I tried writing the JSON strings to a pseudo 3-D array of 100 (and then 50) elements per dimension; and then looped through the outer two dimensions joining them together. 100^3 = 1,000,000 or 50^3 = 125,000 both provide more than enough entries to hold the 50,000 JSON strings. I know I'm not including the 0 index, here.

    So, the 50,000 strings were held in an array from a[1][1][1] to a5[100][100] in the first attempt and of a[1][1][1] to a[20][50][50] in the second attempt. If the dimensions are i, j, k from outer to inner, I joined all the k elements in each a[i][j]; and then joined all of those i x j joins, and lastly all of these i joins into the final text string.

    All attemtps still hit the allocation size overflow before completing.

    So, is there any difference between joining 50,000 smaller strings in one join versus 50 larger strings, if the total data is the same?


  1. Is there a better, more efficient way to build large strings than the join method?

  1. Does the same principle described by Joel Mueller regarding string concatenation apply to reducing a string through substring, such as string = string.substring(position)?

    The context of this third question is that when I read a text file in as a string and break it down into its component JSON strings before writing to the database, I use an array that is map of the file layout; so, I know the length of each JSON string in advance and repeat three statements inside a loop:

    l = map[i].l;
    str = text.substring(0,l);
    text = text.substring(l).  
    

    It would appear that since strings are immutable, this sort of reverse of concatenation step is as inefficient as using the + operator to concatenate.

    Would it be more efficient to not delete the str from text each iteration, and just keep track of the increasing start and end positions for the substrings as step through the loop reading the entire text string?

Response to message about duplicate question

I got a message, I guess from the stackoverflow system itself, asking me to edit my question explaining why it is different from the proposed duplicate.

Reasons are:

  1. The proposed duplicate asks specifically and exclusively about the maximum size of a single string. None of the three bolded questions, here, asks about the maximum size of a single string, although that is useful to know.

  2. This question asks about the most efficient way of building large strings and that isn't addressed in the answers found in the proposed duplicate, apart from an efficent way of building a large test string. They don't address how to build a realistic string, comprised of actual application data.

  3. This question provides a couple links to some information concerning the efficiency of building large strings that may be helpful to those interested in more than the maximum size alone.

  4. This question also has a specific context of why the large string was being built, which led to some suggestions about how to handle that situation in a more efficient manner. Although, in the strictest sense, they don't specifically address the question by title, they do address the broader context of the question as presented, which is how to deal with the large strings, even if that means ways to work around them. Someone searching on this same topic might find real help in these suggestions that is not provided in the proposed duplicate.

So, although the proposed duplicate is somewhat helpful, it doesn't appear to be anywhere near a genuine duplicate of this question in its full context.

Additional Information

This doesn't answer the question concerning the most efficient way to build a large string, but it refers to the comments about how to get around the string size limit.

Converting each component string to a blob and holding them in an array, and then converting the array of blobs into a single blob, accomplished this. I don't know what the size limit of a single blob is, but did see 800MB in another question.

A process (or starting point) for creating the blob to write the database to disk and then to read it back in again can be found here.

Regarding the idea of writing the blobs or strings to disk as they are generated on the client as opposed to generating one giant string or blob for download, although the most logical and efficient method, may not be possible in the scenario presented here of an offline application.

According to this question, web extensions no longer have access to the privileged javascript code necessary to accomplish this through the File API.

I asked this question related to the Streams API write stream method and something called StreamSaver.

Gary
  • 2,393
  • 12
  • 31
  • 1
    Have you considered maybe looking for a solution that doesn't require the entire contents of the file be in a single string before writing? – Kevin B Jun 21 '18 at 22:07
  • I thought about making several blobs from portions of the string and then appending the blobs. But didn't try it because I thought it would take more memory to put the blobs together than the string. I don't want the user to have to download more than one file, but I guess it could be an option. If the string or blobs could be written to a single file on disk in segments without needing the user to install some feature, then that would be superior. Thanks. – Gary Jun 21 '18 at 22:18
  • 1
    Oh, you're creating a data-url client-side for the user to download? that's a bit different. I was assuming server-side. I would still suggest trying to find some alternative to one giant string. – Kevin B Jun 21 '18 at 22:23
  • Yes. Perhaps, I should have been more specific in the question. It all takes place on the client. If an extension API provides a way, that would work for my case also. – Gary Jun 21 '18 at 22:32
  • You could look at the [`File` interface](https://developer.mozilla.org/en-US/docs/Web/API/File) (or more likely its base [`Blob` interface](https://developer.mozilla.org/en-US/docs/Web/API/Blob) which might have a higher tolerance for space. You can create new `Blob`s from existing `Blob`s, so depending on how you're constructing your data, you could create a batch of `Blob`s and then pull them altogether. – Heretic Monkey Jun 21 '18 at 22:39
  • Possible duplicate of [Javascript string size limit: 256 MB for me - is it the same for all browsers?](https://stackoverflow.com/questions/34957890/javascript-string-size-limit-256-mb-for-me-is-it-the-same-for-all-browsers) – ivan_pozdeev Jun 22 '18 at 00:59
  • @Mike McCaughan I noticed that when a large string is converted to a blob(however that should be phrased) the memory usage jumps up to around three times the string size. So, perhaps, building smaller blobs and appending them will use less memoery, not in just building each blob but also in appending them. All I can do is give it a try. Thanks. – Gary Jun 22 '18 at 02:39

1 Answers1

2

In writing a database to disk as a text file of JSON strings.

I see no reason to store the data in a string or array of strings in this case. Instead you can write the data directly to the file.

  • In the simplest case you can write each string to the file separately.
  • To get better performance, you could first write some data to a smaller buffer, and then write that buffer to disk when it's full.
  • For best performance you could create a file of a certain size and create a memory mapping over that file. Then write/copy the data directly to the mapped memory (which is your file). The trick would be to know or guess the size up front, or you could resize the file when needed and then remap the file.

Joining or growing strings will trigger a lot of memory (re)allocations, which is unnecessary overhead in this case.


I don't want the user to have to download more than one file

If the goal is to let a user download that generated file, you could even do better by streaming those strings directly to the user without even creating a file. This also has the advantage that the user starts receiving data immediately instead of first having to wait till the whole file is generated.

Because the file size is not known up front, you could use chunked transfer encoding.

Danny_ds
  • 11,201
  • 1
  • 24
  • 46
  • This sounds great to me, but can it be done in standard javascript? If not, what is required? Also, if I was ambiguous, I mean the client's hard disk not the server. Thanks. – Gary Jun 21 '18 at 22:23
  • @Gary Oh, I must have misread that - I was thinking you meant Java or C#.. - Not sure if memory mapping is available in Javascript, but if you're writing to a file already (the generated string) it would still be better to write to the file directly. Are you working from within a browser? Is that database also on the client? If the database is on the server and you want to download a generated file, see the second part of my answer. – Danny_ds Jun 21 '18 at 22:30
  • I should have been more clear that all of this takes place on the client. The database is in the browser and built by the user in the application, and the download is to the client disk to back up their work. Thanks. – Gary Jun 21 '18 at 22:34
  • @Gary What type of database is this? Maybe those 'strings' are already in memory then? – Danny_ds Jun 21 '18 at 22:38
  • @Gary Anyway, concatenating all those JSON strings first is never a good idea. It will always be better to use one of my first two points: write the strings directly to the file, or buffer them first (for example in a string up to 64kB) and then write that buffer to file in a loop. – Danny_ds Jun 21 '18 at 22:42
  • Just a standard indexedDB database and however it serializes the objects before storing them. I've been extracting and stringifying them before adding to string. – Gary Jun 21 '18 at 23:26
  • 1
    @Gary Ok - I would open the file (in overwrite mode, in case the file already exists), then in a loop: for every record create the json string and write that string to the file (reuse the same string object in the loop). Then close the file. This is probably the easiest way and won't allocate a lot of memory compared to concatenating strings. – Danny_ds Jun 21 '18 at 23:38
  • I'd to try this but I don't see where it is available in javascript on the client, even through the web extension APIs. There's a downloads API but it offers nothing like this. – Gary Jun 22 '18 at 02:36
  • @Gary The same way you are writing the full string to the file now, only in smaller parts? – Danny_ds Jun 22 '18 at 04:46
  • I am not writing directly to a file. An object URL is created and downloaded to the client disk by prompting the client to select where to save it. The web page doesn't have access to the client disk directly. Maybe I am misunderstanding you, but it appears that it can't be done because the web page can't access the file system. – Gary Jun 22 '18 at 05:20