1

This is a follow-up question of the question here.

I would like to load several datasets using d3.csv and d3.json and then combine those datasets using d3.zip. In the example below I use only two. The first dataset will be stored in xyData and the second one in colData. My goal is to call something like

var combinedData = d3.zip(colData, xyData);

however, since these datasets are only accessible inside the d3.csv and d3.json scope, respectively, that does not work. Is there any workaround for that? How would one deal with that if one has even more datasets to load?

The first dataset looks like this:

//xyData.csv
x,y
0,0.00e+000
0.6981317,6.43e-001
1.3962634,9.85e-001

My JSON dataset looks as follows:

//colData.json
{
    "el1": [
        {"color": "green"},
        {"color": "purple"},
        {"color": "brown"}
    ],

    "el2": [
        {"color": "black"},
        {"color": "red"},
        {"color": "yellow"}
    ],

    "el3":[
        {"color": "brown"},
        {"color": "yellow"},
        {"color": "blue"}
    ]
}

I read these datasets in as follows:

    //using foreach 
    var xyData = [];    
    d3.csv("xyData.csv", function(myData) {
        myData.forEach(function(d) {
            d.x = +d.x; //convert data to numbers
            d.y = +d.y;
          });
          console.log(myData[1]);
          xyData = myData;
          console.log(xyData[1])
    });
    console.log(xyData) //this will be an empty array

    //loading the json data
    var colData = [];        
    d3.json("colData.json", function(error, jsonData) {
      if (error) return console.warn(error);
      colData = jsonData;
      console.log(colData)
      console.log(colData.el1[0])
    });
    console.log(colData) //this will be an empty array

    //my goal would be:
    //var combinedData = d3.zip(colData, xyData);

My console.log looks like this:

Array [  ]
Array [  ]
Object { x: 0.6981317, y: 0.643 }
Object { x: 0.6981317, y: 0.643 }
Object { el1: Array[3], el2: Array[3], el3: Array[3] }
Object { color: "green" }

Which shows that loading the data works as expected. But storing them as global variables does not work due to the asynchronous nature of these data loaders (therefore, the two arrays are still empty).

My question is: What is the best way to combine two datasets to one dataset?

Community
  • 1
  • 1
Cleb
  • 25,102
  • 20
  • 116
  • 151
  • See https://stackoverflow.com/questions/21842384/importing-data-from-multiple-csv-files-in-d3 or use [queue.js](https://github.com/mbostock/queue). – Lars Kotthoff Sep 03 '15 at 16:21
  • You have the classic "how can I return data from an asynchronous call" problem, and the answer is - you can't. You *must* to do all work in the callback, you can't set variables in the callback and then do work outside of the callback. Do you use other libraries besides D3? Which ones? jQuery maybe? – Tomalak Sep 03 '15 at 16:23
  • @LarsKotthoff: I saw this question and should have mentioned it as well. How do you then deal with more than two input files? Do you then create an enormous nested structure? – Cleb Sep 03 '15 at 16:28
  • @Tomalak: I used jQuery but not for very advanced stuff. This entire topic is rather new to me so I indeed might ask very basic questions. How would you then deal with a lot of data files? Creating a huge nested structure? Or would it be more reasonable to combine the CSV files and JSON files first to one JSON file? – Cleb Sep 03 '15 at 16:30
  • @Cleb For 2 or more files the best solution is to use queue.js. – Lars Kotthoff Sep 03 '15 at 16:33
  • @Cleb I was asking more if you have jQuery available in *this* project. It has a feature that can make this a little easier. – Tomalak Sep 03 '15 at 16:35
  • @LarsKotthoff: Ok, then I will take a look at this, thanks for the suggestion! If you have time, you could also set up an example as an answer. – Cleb Sep 03 '15 at 16:36
  • @Tomalak: I do have it available, yes. How would such a solution look like? – Cleb Sep 03 '15 at 16:37

2 Answers2

0

Since you said you have jQuery available (*), we can use it's Deferred feature to manage the two asynchronous operations you are looking at.

We are doing this by converting D3's callback-based approach into a promise-based approach.

For that, we set up two helper functions that wrap D3's .csv and .json helpers and return jQuery promises:

d3.csvAsync = function (url, accessor) {
    var result = $.Deferred();

    this.csv(url, accessor, function (data) {
        if (data) {
            result.resolve(data);
        } else {
            result.reject("failed to load " + url);
        }
    });
    return result.promise();
};

d3.jsonAsync = function (url) {
    var result = $.Deferred();

    this.json(url, function (error, data) {
        if (error) {
            result.reject("failed to load " + url + ", " + error);
        } else {
            result.resolve(data);
        }
    });
    return result.promise();
};

Now we can invoke the requests in parallel and store them in variables. We can use .then() to transform the results on the fly, as well:

var colDataReq = d3.jsonAsync("colData.json");
var xyDataReq = d3.csvAsync("xyData.csv").then(function (data) {
    data.forEach(function (d) {
        d.x = +d.x;
        d.y = +d.y;
    });
    return data;
});

Finally, we use the $.when() utility function to wait on both resources and have them handled by a single callback.

$.when(xyDataReq, colDataReq).done(function (xyData, colData) {
    var combinedData = d3.zip(colData, xyData);

    // now do something with combinedData
}).fail(function (error) {
    console.warn(error);
});

This way we can avoid nesting (and therefore needlessly serializing) the two requests.

Also, since the requests are stored in variables, we can simply re-use them without having to change our existing functions. For example, if you wanted to log the contents of one of the requests, you could do this anywhere in your code:

xyDataReq.done(function (data) {
    console.log(data);
});

and it would run as soon as xyDataReq has returned.

Another consequence of this approach is that — since we have decoupled loading a resource from using it — we can perform the loading very early, even before the rest of the page has rendered. This can save additional time.

Community
  • 1
  • 1
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • Thanks a lot, I'll test that soon (which might be only tomorrow or over the weekend) and get back to you once questions arise. I often heard that sometimes problems occur when jQuery and D3 are mixed; have you experienced that as well? – Cleb Sep 03 '15 at 17:00
  • I couldn't say, I have never used D3 much so far. I would assume that this might be true if you do half of the UI work with jQuery and the other half with D3. But in this case I am not using any of jQuery's UI features, so there should be no area of overlap. That being said, jQuery's implementation of promises is not the best around, and if you decide that the approach itself is worth pursuing, you might switch to a separate promise implementation to achieve the same thing. I did it with jQuery solely because you said it was already in the project anyway. – Tomalak Sep 03 '15 at 17:05
  • Ok, thanks for the input on this; let's see whether I get it running since I still need to learn a lot about this topic. – Cleb Sep 03 '15 at 17:10
0

D3.js can actually process a JavaScript object instead of a file. If you replace the file name with the variable name of the object storing (let's say, a JSON array of data) with D3.json(myData){...}, it will have access to that data.

Let's say we are using jQuery and we also include a helper library called Papa Parse (it makes life easier).

Step 1. Turn your CSV data into JSON data and store it in a variable A:

var A = Papa.parse(yourCSV);

Step 2. Read your JSON data and store it in a variable called B

var B;
$(document).ready(function() {
$.getJSON('yourJSON.json', function(json){
    B = json;
});

});

Step 3. Combine datasets A and B into variable C IMPORTANT: You might need to format the CSV json stored in A to look how you expect it to look before we give it to D3 later

var C={};
$.extend(C, A, B);

Step 4. Give C to D3

d3.json(C, function(error, jsonData) {
  // Use data here to do stuff
});

I've used the above as a work around in my own projects.

You might be able to try calling D3.json within D3.csv, but I haven't tried this before:

d3.csv("A.csv", function(errorA, dataA) {
  d3.json("B.json", function(errorB, dataB) {
    // Use data to do stuff
  });
});
Grace
  • 1
  • Thanks Grace, I will test that tomorrow or over the weekend. The second approach is the one suggested [here](https://stackoverflow.com/questions/21842384/importing-data-from-multiple-csv-files-in-d3) but this might become messy once one wants to load more than two dataets. Thanks for your efforts and welcome to stackoverflow :) – Cleb Sep 03 '15 at 17:04
  • You're welcome! Thanks, @Cleb. Let me know if you run into any problems with the implementation. – Grace Sep 03 '15 at 17:43