3

I cannot get multiple files to load data and assign to globals. I've read up on similar questions and related examples, but I still am having trouble.

var origins = [],
    geoJSON = {
      "type": "FeatureCollection",
      "features": []
    };

queue(1)
    .defer(d3.csv, "path_to.csv", function(d) {
        origins.push(d.o_geoid)
      })
    .defer(d3.json, "path_to.json", function(d) {
      // Limit GeoJSON features to those in CSV
      for (var i = d.features.length - 1; i >= 0; i--) {
        if($.inArray(d.features[i].properties['GEOID10'], origins) != -1) {
          geoJSON.features.push(d.features[i]);
        }
      }
    })
    .await(ready);

function ready() {
  console.log(geoJSON);
}

I'm happy to filter the geoJSON features within ready() if that works better, but I need it to happen before I start creating the map with

d3.g.selectAll("path")
    .data(geoJSON.features)
  .enter.append("path")
...

I'm assuming this has to do with callbacks and empty results, but I can't quite get it working. I have figured out that using .await(console.log(geoJSON)); outputs the correct object to the console. The ready() function won't execute though. Thanks for any help understanding and fixing this problem.

Lars Kotthoff
  • 107,425
  • 16
  • 204
  • 204
josiekre
  • 795
  • 1
  • 7
  • 19
  • You cannot assign values from asynchronous callbacks to globals. All your processing has to happen in the `.ready()` function. – Lars Kotthoff Oct 12 '15 at 18:51
  • First, if you are using `queue(1)`, you are running the `.defers` in series. You have to do this because you are using the `origins` variable in your second `.defer`. At this point, using `.queue` becomes pointless. Second, your ready function should take arguments of `ready(error, result1, result2)` where resultN is the return of your `.defer`s. Forget the global variables... – Mark Oct 12 '15 at 18:59
  • @Mark Thanks. I was thinking that the use of `queue` was needed to make sure all data are loaded before I try to use d3 on the data. How do I make sure that the geojson data are filtered before I create the polygons? I only need ~40 of them out of ~770, but those 40 change sometimes based on the input csv. – josiekre Oct 12 '15 at 19:48
  • @LarsKotthoff Is there a good explanation somewhere on why one can't assign values from asynchronous callbacks to globals? – josiekre Oct 12 '15 at 19:50
  • @josiekre See e.g. http://code.tutsplus.com/tutorials/event-based-programming-what-async-has-over-sync--net-30027 – Lars Kotthoff Oct 12 '15 at 19:57
  • @josiekre You can, of course, assign values to globals from asynchronous code. But this will in most cases give unpredictable, unwanted and inconsistent results because of the asynchronous nature of these calls. There are some questions around here on SO having excellent answers describing this in detail: [*Why is my variable unaltered after I modify it inside of a function? - Asynchronous code reference*](/questions/23667086) and [*How to return the response from an asynchronous call?*](/questions/14220321). – altocumulus Oct 12 '15 at 21:03

1 Answers1

4

Your question was already answered by Jason Davies' reply to the thread you linked but anyway, here it is re-stated in terms of your exact example...

var origins = [],
    geoJSON = {
        "type": "FeatureCollection",
        "features": []
    };

queue(1)
    .defer(function(url, callback) {
        d3.csv(url, function(error, csvData) {
            if(!error) csvData.forEach(function(d) {origins.push(d.o_geoid)});
            callback(error, d);
        })
    }, "path_to.csv")
    .defer(function(url, callback) {
        d3.json(url, function(error, jsonData) {
            // Limit GeoJSON features to those in CSV
            for(var i = jsonData.features.length - 1; !error && i >= 0; i--) {
                if($.inArray(jsonData.features[i].properties['GEOID10'], origins) != -1) {
                    geoJSON.features.push(jsonData.features[i]);
                }
            }
            callback(error, jsonData);
        })
    }, "path_to.json")
    .await(ready);

function ready(error) {
    console.log(error ? "error: " + error.responseText : geoJSON);
}

I've never used queue but, if you think about it, it's pretty obvious from Jason's answer how it works.
The basic pattern is

queue()
    .defer(asynchRequest1, url1)
    .defer(asynchRequest2, url2)
    .await(callback)

function callback(error){
    console.log(error ? 
        "error: " + error.responseText : 
        "completed, " + (arguments.length - 1) + " objects retrieved"
    );
}  

The call signature for the first argument of .defer is function(url, callback) and signature of the callback is function(error, result). The former is aligned with d3 conventions (for which queue is obviously designed) and the later is asynchronous javascript (i.e. node) conventional practice.
To make this work, under the hood, queue needs to provide the callback argument, and that needs to be a function that hits the await object, with the result of the asynch request as arguments, using the standard function(error, result) signature.

If you use the direct pattern, where the first argument of defer is d3.csv for example, then, after it completes, d3.csv will invoke the callback provided by queue, therefore connecting with the await object, passing it's error/result state.

In the indirect pattern described by Jason Davies, d3.csv is wrapped in another function - with the same signature - that defers invocation of the internally provided queue callback, until after d3.csv has completed and your post-processing is done.

Now that we understand what's going on, we can think about refactoring to make it cleaner. Perhaps like this...

queue(1)
    .defer(d3.csv, "path_to.csv")
    .defer(d3.json, "path_to.json")
    .await(ready);

function ready(error, csvData, jsonData) {
    if(error) return console.log("error: " + error.responseText);
    csvData.forEach(function(d) {origins.push(d.o_geoid)})
    // Limit GeoJSON features to those in CSV
    for(var i = jsonData.features.length - 1; !error && i >= 0; i--) {
        if($.inArray(jsonData.features[i].properties['GEOID10'], origins) != -1) {
            geoJSON.features.push(jsonData.features[i]);
        }
    }
}

...which has exactly the same effect.

Cool Blue
  • 6,438
  • 6
  • 29
  • 68
  • A few tidbits. In this way, `origins.push(csvData.o_geoid);` won't work. It needs to be `csvData.forEach(function(d) {origins.push(d.o_geoid)});`. I also think the `queue(1)` can be just `queue()` since the one no longer depends on the other. Other than that, this works and simplifies things. Thanks for the thorough explanation. – josiekre Oct 13 '15 at 15:24
  • @CoolBlue, great answer. – Mark Oct 13 '15 at 16:32
  • @CoolBlue, I was waiting for you fix the `origins.push` problem. It doesn't work as is. I'll accept now assuming you'll change that. – josiekre Oct 13 '15 at 19:45
  • @josiekre, Ok... I didn't give a second thought to the points you raised as they derived from elements that were not relevant to the core question and that I _copied verbatim from the exact same context in your original question code_. It is confusing though so I fixed it along the lines you suggested. Regarding the naming of the queue as `queue(1)`, I have not investigated that aspect and have no view about it so I am leaving it as is from your original code. – Cool Blue Oct 14 '15 at 04:13
  • @CoolBlue No worries. Based on how you changed the solution, .forEach was needed to get it to run without error. That's all. – josiekre Oct 15 '15 at 13:22
  • @josiekre, yes, as I said, it also would have been confusing for anyone trying to understand my answer later so I'm glad you pointed it out. Thank's for the question by the way, I didn't really understand queue before, so it was good to have the motivation to do so. Once you break through the initial confusion it really is very cool! – Cool Blue Oct 15 '15 at 13:28