0

I am trying to create a variable for putting in dc.js using a custom reduction (reduceAdd, reduceRemove etc) and am having trouble figuring out how to code it.

I wrote the function outside of these reduce functions and have to now replicate the same inside reduce functions in order to use the same for the graphs plotted. The logic and code written for outside reduce functions are as follows

Logic : For each unique contact_week available (dates), find max value of week_number,then sum up TOTCOUNT variable and DECAY_CNT variable and calculate percentage (DECAY_CNT/ TOTCOUNT) .

Here is the original code without using crossfilter:

 //Decay % logic   
  var dates = d3.map(filter1,function(d) { return d.CONTACT_WEEK;}).keys() ;
  console.log(dates);
  var sum1,sum2 = 0;


  for(var i=0; i<dates.length; i++)
    {
      data1 = filter1.filter(function(d) { return d.CONTACT_WEEK == dates[i] ;});
      //console.log(data1);
      var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
      //console.log(max);
      data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});

      var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
      var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});
      console.log(sum1);
      var decay = sum2/sum1 * 100 ;
      console.log(decay); 

    } 

The first step in this is to identify unique values of dates (contact_week) - How do I go about doing this in the reduce functions as it's already a for loop that traverses through the data?

I guess for max etc, we can use reductio or some other logic as mentioned in comments, but I'm not really getting the approach/design to be followed here

Any help in approach/solutions will be highly appreciated.

UPDATE2 :

Trying a new approach using reductio js

Data explanation :

A few columns in my data - contact_week (dates) ; week_number (numbers - -4 to 6) ; decay_cnt (integers) ; totcount (integers) ; duration (ordinal values - pre, during and post) ;

Now, I need to calculate a percentage called decay %, which is calculated as follows: For each unique contact_week, find max of week_number, now for this filtered dataset, calculate sum (decay_cnt) / sum (totcount)

This has to be plotted in a barchart where the x-axis is duration and the metric - decay % is y axis

In pursuit of calculating the max of week-numbers of individual dates, I've plotted a bar chart for now, with contact_week as x-axis and max of week_number as the y-axis. How do I get the chart that I need?

Code :

dateDimension2  = ndx.dimension(function(d) {return d.CONTACT_WEEK ;});
decayGroup = reductio().max(function (d) { return d.WEEK_NUMBER; })(dateDimension2.group());


chart2
    .width(500)
    .height(200)
    .x(d3.scale.ordinal())
    //.x(d3.scale.ordinal().domain(["DURING","POST1"]))
    .xUnits(dc.units.ordinal)
    //.xUnits(function(){return 10;})
    //.brushOn(false)
    .yAxisLabel("Decay (in %)")
    .dimension(dateDimension)
    .group(decayGroup)
    .gap(10)
    .elasticY(true)
    //.yAxis().tickValues([0, 5, 10, 15])
    //.title(function(d) { return d.key + ": " + d3.round(d.value.new_count,2); })
    /*.valueAccessor(function (p) {
    //return p.value.count > 0 ? (p.value.dec_total / p.value.new_count) * 100  : 0;
    return p.value.decay ;
    })*/
    .valueAccessor(function(d) { return d.value.max; })
    .on('renderlet', function(chart) {
        chart.selectAll('rect').on("click", function(d) {
            console.log("click!", d);
        });
    })
    .yAxis().ticks(5);

Any approach/suggestions will be highly appreciated

I think the solution mostly lies in the fake groups/dimensions and reduction js combined approach. Any alternatives are most welcome!

Pravin
  • 461
  • 5
  • 26
  • 1
    Please show the reduceAdd/reduceRemove code that is causing the error. The best thing to do would be to create a working example of the issue at jsfiddle or a similar site. – Ethan Jewett Dec 28 '15 at 12:56
  • The good news is the main loop and the first filter should be handled by the crossfilter groups automatically. The bad news is that crossfilter doesn't have min/max type stuff built in, and they're difficult to do efficiently. You end up storing some reference to the rows in each bin. If you search around you'll find various codes for this, e.g. http://stackoverflow.com/a/32925852/676195 - IIUC your problem is just a more complicated reduce on the same kind of data. – Gordon Dec 28 '15 at 20:20
  • [reductio](https://github.com/esjewett/reductio) also has min/max stuff but I don't know if it supports filtering the rows within the bin and reducing on them. Interesting design challenge, @Ethan! – Gordon Dec 28 '15 at 20:24
  • @Gordon You can almost do this with Reductio, but I think not quite. It supports the filter predicates and min/max calculation, but filtering on min/max is the problem. It's just not clear to me exactly what the filtering on min/max means semantically. A working example would probably clarify. – Ethan Jewett Dec 28 '15 at 21:23
  • @EthanJewett : Used reductio js now and updated the question. Do have a look whenever free – Pravin Jan 07 '16 at 07:25
  • I am not sure why this is getting downvoted - maybe because you asked in a "please code this for me" sort of way. It's a legitimate question as there is not much infrastructure for this kind of problem. I've started on a solution but I want to make it general so it will help others. Will post later today. – Gordon Jan 07 '16 at 11:15
  • I apologize that reductio may have been a false lead. I was mostly trying to point out that this was an interesting kind of problem where reductio could conceivably be expanded to help. I don't think fake groups are needed here because the shape of the bins is handled easily by crossfilter. It's just a complex reduce. – Gordon Jan 07 '16 at 11:25

1 Answers1

3

I've just added a FAQ and an example for this kind of problem.

As explained there, the idea is to maintain an array of rows which fall into each bin, since crossfilter doesn't provide access to that yet. Once we've got the actual rows, your calculations are almost the same as you are doing now, except that crossfilter keeps track of the list of weeks for you.

So you can use these functions from the example:

  function groupArrayAdd(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.right(elements, keyfn(item));
          elements.splice(pos, 0, item);
          return elements;
      };
  }

  function groupArrayRemove(keyfn) {
      var bisect = d3.bisector(keyfn);
      return function(elements, item) {
          var pos = bisect.left(elements, keyfn(item));
          if(keyfn(elements[pos])===keyfn(item))
              elements.splice(pos, 1);
          return elements;
      };
  }

  function groupArrayInit() {
      return [];
  }

You need to have a unique key in your records so that they can be added and removed reliably. I'll assume that your records have an ID field.

Define your week dimension and group like so:

var weekDimension = ndx.dimension(function(d) {return d.CONTACT_WEEK ;}),
    id_function = function(r) { return r.ID; },
    weekGroup = weekDimension.group().reduce(groupArrayAdd(id_function), groupArrayRemove(id_function), groupArrayInit);

Then the most efficient time to calculate your metric is when it's needed, in the value accessor. So you can define your value accessor with the heart of the code you posted in your question.

(Of course, this code is untested because I don't know your data.)

var calculateDecay = function(kv) {
    // kv.value has the array produced by the reduce functions.
    var data1 = kv.value;
    var max = d3.max(data1, function(d) { return +d.WEEK_NUMBER ;});
    data2 = data1.filter(function(d) { return d.WEEK_NUMBER == max ;});

    var sum1 = d3.sum(data2, function(d) { return d.TOTCOUNT ;});
    var sum2 = d3.sum(data2, function(d) { return d.DECAY_CNT ;});

    var decay = sum2/sum1 * 100 ;
    return decay;
}

chart.valueAccessor(calculateDecay);
Gordon
  • 19,811
  • 4
  • 36
  • 74
  • This is great, it's working with the correct values finally! First of all, thanks for all the effort and the massive help here! However, one simple (or tricky) caveat is left in my question, I want the dimension to be DURATION (not contact_week). X-axis should show duration in ordinal manner. (I was just testing with contact_week before) Data is explained above in a one-liner, duration is just another column (values - during, pre, post etc). Is that possible? I tried changing dimension names, that doesn't help as it's connected to group. Kindly let me know how to do that – Pravin Jan 07 '16 at 15:51
  • I don't see how that affects the essential question. Can't you just define your dimension accordingly, and use the same reduce functions I've described? – Gordon Jan 07 '16 at 15:54
  • Oh, my bad! I was trying something else. It works perfectly and you've solved the question! Thanks a lot, Gordon! – Pravin Jan 07 '16 at 16:00
  • Nice! Glad to help, and nice to have a reason to develop a canonical answer to a frequently asked question. – Gordon Jan 07 '16 at 16:03
  • Well, vote farming is scorned around SO, so I won't do that. But hopefully as people find the question helpful it will regain a positive score. Haters gotta hate, and unfortunately we do get some haters around this tag. – Gordon Jan 08 '16 at 04:36
  • Sure, gordon! Thanks again for all the help. Cheers! – Pravin Jan 08 '16 at 05:12
  • Glad this worked out :-) And based on Gordon's answer I'm pretty sure I can confirm that Reductio can't do this - at least not at the moment. I get the feeling that a lot of questions in this tag get down-voted because people come from the javascript tag and think it's a trivial question regarding Array.map/Array.reduce. I'd recommend actually not putting the javascript tag on Crossfilter questions, unfortunately. :-/ – Ethan Jewett Jan 09 '16 at 17:46
  • @EthanJewett : Yes, I tried reductio for quite a while, but wasn't able to get the right graphs (many loops, issues etc). Removed the javascript tag for now, hope this question helps others out doing the same stuff. Btw, saw both of your discussions on github (on complex reduce), thanks for the help again! – Pravin Jan 11 '16 at 05:59