d3js v5 + Topojson v3 Optimization about joining csv & json

Question

In order to make maps, I need to import some values from csv to json directly in the code. For loading json and csv files, I use an asynchronous operation with Promise object and I use two loops and a common key to add new properties on json file.

for (var i=0; i< fr[1].length;i++){
        var csvId = fr[1][i].codgeo;
        var csvValue1 = parseFloat(fr[1][i].value1);
        var csvValue0 = parseFloat(fr[1][i].value0);
        for (var j=0; j<topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features.length;j++){
          var jsonId = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.codgeo;
          if (csvId === jsonId) {
            topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value1 = csvValue1;
            topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value0 = csvValue0;
            break;

Everything is working but show up the map on the web takes time. Is there a way to optimize the loading time of the map ?

Here is a sample of my code : https://plnkr.co/edit/ccwIQzlefAbd53qnjCX9?p=preview

Andrew Reid · Accepted Answer · 2018-04-24T17:55:22.907

I took your plunkr and added some timing points to it, ran it a bunch of times and got some data on where your script takes its time:

Here's a block with the logging.

I am pretty sure my bandwidth where I live is below average and has a ton of variability; the file load time showed a lot of variability for me, down to 500 milliseconds and up to 1800 milliseconds, everything else was consistent

Let's take a closer look a the data manipulation stage, which you include in your question:

for (var i=0; i< fr[1].length;i++){
        var csvId = fr[1][i].codgeo;
        var csvValue1 = parseFloat(fr[1][i].value1);
        var csvValue0 = parseFloat(fr[1][i].value0);
        for (var j=0; j<topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features.length;j++){
          var jsonId = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.codgeo;
          if (csvId === jsonId) {
            topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value1 = csvValue1;
            topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8).features[j].properties.value0 = csvValue0;
            break;

The nested for statement runs approximately 5,151 times by my count. The parent for statement runs 101. These shouldn't change as your data is fixed. Why do these cycles take so long? Because you are calling topojson.feature() every for iteration:

If I isolate this line:

topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8)

We can see that this actually takes a few milliseconds alone.

Topojson.feature

Returns the GeoJSON Feature or FeatureCollection for the specified object in the given topology. If the specified object is a GeometryCollection, a FeatureCollection is returned, and each geometry in the collection is mapped to a Feature. Otherwise, a Feature is returned. The returned feature is a shallow copy of the source object: they may share identifiers, bounding boxes, properties and coordinates. (from the docs).

So, everytime we use topojson.feature we are essentially converting the topojson to geojson. We don't need to do this in the for loop. Let's do that once:

  var featureCollection = topojson.feature(fr[0],fr[0].objects.dep_GEN_WGS84_UTF8);

  //Merge csv & json
  //Add properties from csv to json)
 for (var i=0; i< fr[1].length;i++){
    var csvId = fr[1][i].codgeo;
    var csvValue1 = parseFloat(fr[1][i].value1);
    var csvValue0 = parseFloat(fr[1][i].value0);
    for (var j=0; j<featureCollection.features.length;j++){
      var jsonId = featureCollection.features[j].properties.codgeo;
      if (csvId === jsonId) {
        featureCollection.features[j].properties.value1 = csvValue1;
        featureCollection.features[j].properties.value0 = csvValue0;
        break;
      }
    }
  }

Of course, we have to update the portion of code that renders to use the featureCollection variable too, rather than the topojson

Let's take a look at timing now:

Here's an updated bl.ock based on the one above, also with timing points.

No, I didn't forget to include a time for manipulation, it just averaged 1.5 milliseconds for me. Yes, the variability in my bandwidth shows - but the time spent on other manipulation should be clearly less regardless of external factors

Further Enhancements

Preprojection of geometry, see this question/answer.

Simplification of geometry, see mapshaper.org (though I believe you have already done this).

Removal of non-necessary attributes from csv or topojson - are you really using the population field in the topojson, do you need both libgeo and libgeo_m in the topojson (eg: "libgeo":"Puy-de-Dôme","libgeo_m":"PUY-DE-DÔME")?

d3js v5 + Topojson v3 Optimization about joining csv & json

1 Answers1

Linked