0

I'm generating test/development dummy data with a node script in my Mongo database (using Mongoose) which includes Geolocation coordinates. (sets of lat/lon). Schema follows:

    location: {
      type: {
        type: String,
        enum: ["Point"], // 'location.type' must be 'Point'
        default: "Point",
      },
      coordinates: {
        type: [Number],
        required: true,
      },
      geocoded: {
        type: String, // this has to be done with an external API
      },
    },

For that reason, I have an external (paid) Reverse Geocoding API which I want/need to call for each document/set of coordinates. The Geocoding API though has a rate limiter so I'm hitting 429 - too many requests. I'm looking for a clean and simple solution to run my requests sequentially and add a throttling/waiting time ( for a specified number of milliseconds ) after each HTTP request.

messageSchema.pre("insertMany", async function save(next, docs) {
      docs.map(async (doc) => { // now I understand I should replace map with for ... of or for ... in
        [err, response] = await to(
            reverseGeocode(
              doc.location.coordinates[0],
              doc.location.coordinates[1]
            )
        );
        if (err) {
          next(err);
        }
        doc.location.geocoded = response;
      });
    });

The reverseGeocode signature:

  reverseGeocode: (lon, lat) =>
    axios({
      baseURL: "https://eu1.locationiq.com/",
      url: "v1/reverse.php",
      params: {
        lat,
        lon,
        key: geocodeKey,
      },
    }).then((response) => response.data),
George Katsanos
  • 13,524
  • 16
  • 62
  • 98
  • Your `save` function never waits for the array of promises you are constructing. – Bergi Jun 28 '20 at 12:27
  • 1
    If you want to make the requests sequentially, just [don't use `map` but a normal loop in which you `await`](https://stackoverflow.com/a/37576787/1048572). No need to use artificial delays either. – Bergi Jun 28 '20 at 12:28
  • "*I would also ideally want to fork child processes in order to not block the script/process.*" - no, your code already is non-blocking. – Bergi Jun 28 '20 at 12:29
  • @Bergi could you combine your comments into a solid answer? I am a bit confused. btw : https://www.npmjs.com/package/await-to-js this is what to(..) does, it's quite convenient to avoid having to include a try catch block. I need to throw /catch the error because otherwise Node throws the uncaught Promise error thingy – George Katsanos Jun 28 '20 at 20:32
  • I can't suggest an answer since I can't tell why you are doing the geocoding at all if you are ignoring its result. Regarding the `to`, `try`/`catch` is not really less convenient than what you are doing, and the problem is that you are not actually handling errors - you are rethrowing them, causing the code to work the same as if you had never caught them. – Bergi Jun 28 '20 at 21:31
  • First things first :) I am reading your interesting answer there https://stackoverflow.com/a/37576787/1048572 , trying to understand if .map is an asynchronous "loop" and for/in for/of synchronous? – George Katsanos Jun 28 '20 at 21:34
  • @Bergi I modified / cleaned up my code and the question to make the main goal clearer ! I hope that helps :) – George Katsanos Jun 28 '20 at 21:38
  • Neither `map` nor `for` are synchronous when you are using `await`. `map` is "all at once" whereas `for` is "one after the other", though. – Bergi Jun 29 '20 at 12:49
  • @Bergi just pinging to say I did a massive overhaul of the question in case it makes it more useful for future wonderers. any input appreciated. – George Katsanos Jun 29 '20 at 17:00

2 Answers2

0

I use this library for throttling requests. You simply tell it what the rate limit of the API is, and you can call it as fast as you want and it will automatically space the requests out over time for you.

If you don't want another dependency, then to make your solution work you need to use a for loop. map will always execute as fast as it possibly can.

const wait = (time) => {
  return new Promise((resolve) => {
    setTimeout(resolve, time);
  });
}

messageSchema.pre("insertMany", async function(next, docs) {
  for(let i in docs) {
    const doc = docs[i];
    await wait(3000); // milliseconds to space requests out by.
    const response = await reverseGeocode(
      doc.location.coordinates[0],
      doc.location.coordinates[1]
    );
  }
  console.log(this);
});
chrispytoes
  • 1,714
  • 1
  • 20
  • 53
  • to: https://www.npmjs.com/package/await-to-js. reverseGeocode is an AXIOS Get request, it returns a promise. Why was this downvoted? – George Katsanos Jun 28 '20 at 20:31
  • @GeorgeKatsanos Ah okay, I'm just not familiar with `to`. You don't really need it though, I've adjusted my solution. `await` already throws the error if there is one, so using `throw new Error(err)` is actually redundant and you can remove `to` altogether. As for why this was downvoted, it's because it doesn't retry if there's a rate limit error, but in my opinion you should just try not to hit the limit in the first place because they may just block you for longer if you keep retrying. The other solution will do that if that's what you're looking for. – chrispytoes Jun 28 '20 at 20:55
  • @GeorgeKatsanos If you're calling this API from many places in your code though, I'd probably recommend just using the library I suggested. You can make one throttle instance and pass it to wherever it's needed, and then it will throttle the requests globally across your application. – chrispytoes Jun 28 '20 at 21:12
  • I'm trying to figure out the library, but something's not right. https://repl.it/@gkatsans/delay-asyncawait-to – George Katsanos Jun 28 '20 at 21:14
  • @GeorgeKatsanos [https://repl.it/repls/FirstPuzzledStack](https://repl.it/repls/FirstPuzzledStack) Fixed it. You need to return using async style. What you were doing was beginning all the requests at once, then returning the still empty array before they arrive. – chrispytoes Jun 28 '20 at 21:18
  • makes total sense, did this in a haste.. I am wondering if we could use `Promise.all()` to simplify this even more – George Katsanos Jun 28 '20 at 21:22
  • @GeorgeKatsanos No need really. Promise.all is for when you need to execute many things in parallel, but catch everything with a single callback. You're trying to run things sequentially to not get rate limited. I believe what you have here is the simplest solution I can think of. – chrispytoes Jun 28 '20 at 21:25
-1

There's no need to "fork" the process as nodejs has native async promise support.

The best solution is actually to add delays to your code to match the limit. Assuming this is too cumbersome and the fact that this is only for development purposes here is a quick brute force example:

Note that waiting on a promise does not block the process as synchronous code would.

async function getGeoCode(doc) {
    return reverseGeocode(
        doc.location.coordinates[0],
        doc.location.coordinates[1]
    )
}

const randomSleep = [1000, 2000, 3000, 4000, 5000, 6000];

messageSchema.pre("insertMany", async function save(next, docs) {
    let sleep = false;
    docs.map(async (doc) => {
        
        
        while (sleep) {
            const randomSleep = randomSleep[Math.floor(Math.random() * randomSleep.length)];

            await new Promise(function (resolve) {
                setTimeout(resolve, 2000 + randomSleep)
            });
        }
        
        let [err, response] = await getGeoCode(doc);
        if (err) {
            if (err.statusCode !== 429) {
                throw new Error(err);
            }

            sleep = true;

            while (err && err.statusCode === 429) {
                const randomSleep = randomSleep[Math.floor(Math.random() * randomSleep.length)];

                await new Promise(function (resolve) {
                    setTimeout(resolve, 2000 + randomSleep)
                });
                
                [err, response] = await getGeoCode(doc);
            }
            sleep = false;
        }

    });
    console.log(this);
});
  • 1
    `const randomSleep = randomSleep[…]` can never work. And really, do you need random backoff here? – Bergi Jun 28 '20 at 13:20
  • 1
    I feel like this solution severely over-complicates this entire thing. You've just doubled the size of his code just to add a simple random delay? Check my solution. – chrispytoes Jun 28 '20 at 13:55
  • The random is just so not all the bulk executes at the same exact time, it's not needed but betters performances, and I didn't know "doubling" code is a factor when it comes to solutions. your solution iterates 1 by 1 which is the naive approach and doesn't perform well. – George keri Jun 28 '20 at 14:16
  • @Georgekeri I'm sorry if I came off aggressively. I didn't know randomness would improve performance. Can you explain that? Also, isn't the point of this to iterate 1 by 1? He's trying to space the requests out to not get rate limited so why would you NOT want to do them 1 by 1? Not upset just actually trying to understand your solution better. – chrispytoes Jun 28 '20 at 14:31
  • I get your solution now actually. You're executing everything in parallel but delaying each request by a random amount every time, and retrying if it fails with a 429 error. I never thought of it that way and I think I actually like this better now lol. Depends on exactly what kind of rate limit the API has though really. If it's the kind that locks you out temporarily for too many requests, it could end up getting stuck in a loop until the ban is lifted. I would probably rather just never hit the limit in the first place and throw an error if it does. – chrispytoes Jun 28 '20 at 14:46
  • 1
    @Georgekeri I appreciate the response but it indeed looks quite complicated/ many LOCs – George Katsanos Jun 28 '20 at 20:29