Is there an issue with putting all of my code in the argument for d3.csv()?

Question

I am learning d3.js and I have recently been encountering issues with not being able to access variables inside the d3.csv() function. I have just been initializing my variables at the beginning of my program to just make all my variables global.

This made me wonder if there is an issue if I were to just put all my code inside the d3.csv function, removing the need to even initialize my variables at the beginning of my code so it would look like:

d3.csv(data.csv, (data)=>{
    all of my code
});

Is there a downside to this (assuming I'm only using one CSV file) or is there some benefit to keeping code that doesn't need the data outside of the d3.csv method?

Each function within your code should do one thing, and do it well. If you put all your code within a callback function, you'll soon run into issues where you have too many nested call backs — EDToaster, Jul 10 '19 at 20:33

Gerardo Furtado · Accepted Answer · 2019-07-11T05:08:42.860

_{Note: Since you're asking about the callback of d3.csv I'm assuming you're using D3 v4 or below, because D3 v5 uses the then method of a promise. However, the rationale is the same.}

The most important information is that d3.csv, as all other D3 XHR methods, is an asynchronous function. That means that everything inside the callback runs only after the CSV was downloaded and parsed.

//Outside the callback
//Code here runs immediately

d3.csv("example.csv", (data) => {
    //Inside the callback
    //Code here runs only after the CSV was downloaded and parsed
});

//Outside the callback
//Even if these lines come after d3.csv, code here runs before the code inside the callback

By the way, that explains your initial complaint ("... I have recently been encountering issues with not being able to access variables inside the d3.csv() function"). This answer is a good read on that subject.

With that in mind we have to optimise the code in such a way that things that don't depend on the data can be created/set immediately, because if we put them inside the callback we'll lose time without any good reason.

In a nutshell, you can put outside the callback things such as (but not limited to):

Selecting/creating the SVG, canvas or HTML containers
Scales (with ranges)
Axes generators
Line generators
Area generators
Stack generators
Pie generators
Histogram generators
Map projections
Hierarchy layouts
Formats (like time formats)
Force simulators
Drag behaviours
Zoom Behaviours

All those things don't depend on any data. For some of them (like the line generator, the area generator, the stack generator etc...) you'll pass the data after you have it.

Then, inside the callback, you put everything that depends on the data, such as (but not limited to):

Update, enter and exit selections
Scale's domains
Calling axes generators
Setting simulation's nodes and links
Nests
Passing the data to line generators, area generators, stack generators etc...
Transitions (that depend on the data)
Event listeners (that depend on the data)

As you can see, if you put eveything inside the callback you'll have a bunch of methods that could run immediately but, instead of that, they are just sitting there unnecessarily waiting for the data to be downloaded.

Does this mean that I can create scales with ranges along with axes generators that call those scales outside the `d3.csv()` function and then once the data is parsed go into the `d3.csv()` function and add the domains and call the axes generators? I definitely see how that would allow for more tasks to run in parallel. — J. Hurley, Jul 16 '19 at 16:21

Is there an issue with putting all of my code in the argument for d3.csv()?

1 Answers1