1

So I have to run a process that needs to parse each line of a file, find some specific string (by position), make a call to an HTTP api (which will return the transformation needed for that string), replace with the response and then save to an output file, keeping the order of the original file.

I found several options to do this, but thing is my input files would be 10 million + rows. I could do it line by line, but I would like to take leverage of the HTTP api, supporting over 1500 TPS and therefore I could parallelize the HTTP calls.

I was thinking of reading the files in chunks, getting the strings that needs replacement, calling the HTTP with a promise.all or something and then going to next batch. However I wasn't able find a way on doing this.

I went over all solutions suggested here but that doesn't contemplate the parallel line processing.

Any ideas how can this be done by parallelizing the HTTP calls?

TGrif
  • 5,725
  • 9
  • 31
  • 52
Fede E.
  • 2,118
  • 4
  • 23
  • 39
  • Post a little bit of code, does every line of the 10 million rows needs to be transformed? and post an exaple file (5-10 lines) and show what needs to be transformed. – Marcos Casagrande Jun 09 '18 at 13:50
  • Write the simplest possible solution, which is to read one line at a time and parse it. Once parsed, issue a pipelined HTTP request. Go to the next line. Surely the file reading will be faster than the network. – John Zwinck Jun 09 '18 at 14:35
  • This type of scheme will allow you to process N requests in parallel while iterating through a large array: [Make several requests at a time while processing large array](https://stackoverflow.com/questions/33378923/make-several-requests-to-an-api-that-can-only-handle-20-request-a-minute/33379149#33379149) or this one [Async request with a list of URLs](https://stackoverflow.com/questions/47299174/nodejs-async-request-with-a-list-of-url/47299802#47299802). – jfriend00 Jun 09 '18 at 15:16
  • And, here's a scheme that runs N requests/second [Choose proper async method for batch processing for max requests per sec](https://stackoverflow.com/questions/36730745/choose-proper-async-method-for-batch-processing-for-max-requests-sec/36736593#36736593). – jfriend00 Jun 09 '18 at 15:18

0 Answers0