0

I have a huge dataset, something like 200MB in txt format. I was able to send the data to Angular, via http call.

Now I need to send the data back, after small editing done locally, which can be done in every single "line" (see Reading an FASTA file, this is the file format).

For my surprise, I cannot send it back, it is too big; I am saving to MongoDB. I have noticed that if I send say 1%, it goes. So, the solution I have now is sending the data in smaller chunks of data. Nonetheless, I have tried some suggested approaches here, but nothing.

I have found a similar question here: Angular 7 Handling large post http request

My question is: for receiving the data we can use observer; thus, the data is received in small chucks. Is there a clear way to make it, say, a method, or something that I can input the whole dataset and it does the rest?

Possible hint: if I can send 1%, I could divide it in 1% chunks, and send then in sequence; but I need to make sure that Angular will not send the HTTP call until the previous one is finished, otherwise, the express app crashes due to JavaScript memory issues.

See here the first version of this question, where everything started: How can I hold on http calls for a while?

I am not sure if this is an opened question, since I am not providing codes. I am still learning to use this plataform; sometimes we take too many golden eggs from the chicken! thanks in advance! :)

Discussions

@zzfs wisely gave me some insights, I will consider them; nonetheless, I am opened to code as example, samples; it is easier to learn from code, at least for me.

@zzfs mention to consider the "difference", and save just the changes, that is my current approach. The reason I am insisting, that I have opened this bounty is that I want to have, if possible, a robust system, which means if a person adds too much comments, the difference will be small, and the system can crash; the probability of that happening, I believe, is small since it is most unlikely that some will add say 1.000-10.000 comments in a single time.

The other problem is that do not know if my current way of uploading the dataset is the best, I am uploading from an app in Mongo and Express, locally saved.

1 Answers1

1

Sending such large files to & from the client only makes sense when for example downloading blobs like zips or, say, uploading blobs to the server (think youtube or dropbox uploads). All other use cases are usually handled without such large transfers...

You mentioned you'd only perform "small editing". This means you should only send the difference, not the whole thing. Your question is too broad so I cannot give you a clearer answer but when your large file is, say, a JSON or a text file and the user changed something in it, you can quite easily mark the position of that change and send it along with the actual, changed payload. Your server (node running express?) should then be able to diff-check and apply the change to the 200MB file.


Chunking multiple HTTP requests is of course supported & happens frequently. When you really want to upload your 200MB file, you can split it and use either switchMap for sequential uploading or forkJoin for parallelized requests.

But then you'd still do some kind of post-processing on the server (combining the chunks in the most trivial case) so your attention should rather be directed at implementing the diff-checker method described above.

Joe - GMapsBook.com
  • 15,787
  • 4
  • 23
  • 68
  • Dear @jzzfs, thanks for the answer, indeed, I am implementing the approach of "difference", nonetheless, if someone edits say a huge amount of data, the amount to send will be huge. I am using a file that has genetic sequence, then someone can comment on them. When there is no comment, it is okay, I can apply this difference checking, which is my current approach, but I would like to consider a real situation wherein a huge amount of data was edited (I am afraid the system may crush on those case using) – Jorge Guerra Pires Mar 29 '20 at 17:06
  • and in that case, I still have the above mentioned problem. Furthermore, I am not sure the origin of the data, I may need to take it from the front. – Jorge Guerra Pires Mar 29 '20 at 17:06
  • I shall think about what you said – Jorge Guerra Pires Mar 29 '20 at 17:07
  • 1
    I see. Well if the people only comment on it, that's pretty easy -- just save the comment, not the whole text -- think google docs comments. If your text file is a genetic sequence, you can still divide it somehow into blocks. Think of a jupyter notebook where the doc is large but you have individual block that you edit... With regards to the code -- we're not going to write your code but can pinpoint you in the right direction when you've shared what you already have :) – Joe - GMapsBook.com Mar 29 '20 at 18:13
  • 1
    Dear @jzzfs, I really liked your idea of just sending the comments, indeed, it could solve the problem; I could send just the comments added and their respective id. Nonetheless, since we have time, let's leave the bounty opened for now, I am interested to hear others' perspective. Best, – Jorge Guerra Pires Mar 29 '20 at 21:50