0

I have a very weird piece of behavior happening. I have a csv file which I need to geocode. It has 250 lines. Google geocoding api has a rate limit per second so I can't just write a for loop over each row. I've elected to use an interval. Every fraction of a second I send off the api request and when finished the results should be written to a new csv file. However no matter what I do I keep only getting 247 results.

In an effort to see where the break was happening I set a called variable to see how many time my geocoding function was being called. It is indeed being called 250 times. Then I started moving the called variable deeper into the function callback to see if that was hit 250 times. The weird part is that it is and it isn't. If I include the increase in called right after the error check it's 250, but if I put it below the push statement it's 247.

let called = 0;
/*Handles geocoding via google api*/
function geoCode(obj){
    console.log(`${rows.length} items remaining`);
    processing += 1;
    try {
        googleMapsClient.geocode({
            address: `${obj.ST_ADDRESS} ${obj.CITY} ${obj.STATE} ${obj.ZIP_CODE} ${obj.COUNTRY}`
        }, function(err, response) {
            if (!err) {
                called +=1; <------------HERE IT'S 250
                processing -= 1;
                const result = obj;
                result.LATITUDE = response.json.results[0].geometry.location.lat;
                result.LONGITUDE = response.json.results[0].geometry.location.lng;
                results.push(result);
                called +=1; <-----------------HERE IT'S 247
                if(rows.length < 1 && processing === 0){
                    // begin writing to file
                    writeFile(results);
                    console.log('called', called);
                }
            }else{
                console.log(err);
            }
        });
    }catch (e) {
        console.log(e);
    }
}

/* Loop through results at sustained pace to prevent rate limiting */
function locationLoop(){
    interval = setInterval(()=>{
        if(rows.length > 0){
            geoCode(rows.shift());
        }else{
            clearInterval(interval);
        }
    }, 50);
}
  • 1
    Is it possible that this line `if(rows.length < 1 && processing === 0)` isn't in sync with how fast the rows are being shifted (`50ms` in your case)? Maybe instead of using an `interval` just call `locationLoop` from within the callback of the `geocode` function... – Sagiv b.g Aug 08 '19 at 16:31
  • 1
    oh I really like that. So much more elegant. ty :) –  Aug 08 '19 at 16:35
  • It might be a lot simpler to just see when you have all the results rather than basing your completion on the input array being empty. Then, you aren't susceptible to there still being some requests in process. – jfriend00 Aug 08 '19 at 16:35
  • Also, your error handling needs to update all your state variables. – jfriend00 Aug 08 '19 at 16:37
  • Also, I'm not believing that within the same block with on intervening async functions, it's `called` is 250 and suddenly goes to 247 just because you called `results.push(result).` unless `.push()` is some custom method that modifies `called`. So, there's something you aren't telling us about where/when/how you see the 247 come after the 250. – jfriend00 Aug 08 '19 at 16:40
  • @jfriend00 it doesn't jump from 250 to 247, I just didn't want to write the code twice with only one line moved. also, that's literally all the code that handles the results and called variables. It wouldn't be very helpful for me to hoodwink you guys. –  Aug 08 '19 at 16:45
  • Then, the only possible explanation is that `if(rows.length < 1 && processing === 0)` is not properly detecting the last request so you're outputting `results` before the last 3 requests are done. I'd personally rewrite the code in a lot cleaner fashion that doesn't have all these higher scoped, shared processing variables and detects the number of results I have rather than remaining input and update all the state variables correctly in all the error conditions. – jfriend00 Aug 08 '19 at 16:56
  • @jfriend00 turns out it was just bad error handling. Rewrote to promises and fixed it. Also not a fan on the high scoped vars but I always save the cleanup for once I know something works first. –  Aug 08 '19 at 17:03
  • 1
    FYI, you may be interested in these other answers: [Loop through an api get request with variable URL](https://stackoverflow.com/questions/48842555/loop-through-an-api-get-request-with-variable-url/48844820#48844820), [Make several requests to an API that can only handle 20 request a minute](https://stackoverflow.com/questions/33378923/make-several-requests-to-an-api-that-can-only-handle-20-request-a-minute/33379149#33379149) or [Choose proper async method for batch processing for max requests/sec](https://stackoverflow.com/questions/36730745/x/36736593#36736593). – jfriend00 Aug 08 '19 at 17:12

1 Answers1

1

I'm not sure if this answer is correct.

It's possible that the code between where you detect 250 and 247 throws an error. For instance if the API can't resolve a coordinate and response.json.results contains no items. Trying to access response.json.results[0].geometry will give an error like can't get geometry of undefined.

This error is lost because it's not in the event that has the try-catch.

So for 3 of the 250 results Google can't resolve the coordinate.

Promises are a great way of dealing with asynchronous code and failures.

Halcyon
  • 57,230
  • 10
  • 89
  • 128
  • 1
    thank you, you were correct. 3 of the address were returning duds on the response object and it wasn't rising up to my catch block. –  Aug 08 '19 at 16:51