0

I have my bank transaction data in the following format:

var transactions = {
    food: [
        {
            date: new Date('2016-01-09'),
            amount: 123.45
        },
        {
            date: new Date('2016-01-16'),
            amount: 87.88
        },
        {
            date: new Date('2016-01-23'),
            amount: 99.99
        },
        {
            date: new Date('2016-01-30'),
            amount: 99.99
        }
    ],
    doctor: [
        {
            date: new Date('2016-01-15'),
            amount: 1124.01
        },
        {
            date: new Date('2016-01-16'),
            amount: 656.00
        },
        {
            date: new Date('2016-01-23'),
            amount: 1000.00
        },
    ]
}

That is, an object that looks like {transaction_type: [array of transactions]}.

I would like to group these transactions by date, so in the end I get

var aligned_transactions = [
    {
        date: new Date('2016-01-09'),
        amounts: [123.45]
    },
    {
        date: new Date('2016-01-15'),
        amounts: [1124.01]
    },
    {
        date: new Date('2016-01-16'),
        amounts: [87.88, 656.00]
    },
    {
        date: new Date('2016-01-23'),
        amounts: [99.99, 1000.00]
    },
    {
        date: new Date('2016-01-30'),
        amounts: [99.99]
    }
]

So the amounts are now grouped by date. Of course, in the real setting, there are several hundred transaction types, each with an array of thousands of transactions (roughly 1 million transactions to process). What is the "fastest" way to transform the transactions in this manner? Here, fastest means total time spent doing the transformation. A jsperf result would be awesome.

Note that I have tried several approaches, and have found the "obvious" approach of nested for loops to be quite slow: about 12 seconds on my machine with 1 million total transactions (Chrome, Ubuntu). I'm guessing creating all these new objects is taking its toll.

One approach that was somewhat promising was to "vertically" slice these transaction lists, so that I get a bunch of small arrays, which I then create objects from, and recursively "merge" them together. This was pretty fast, about 6 seconds on my machine with the same 1 million transactions above. Still, I'm hoping there's a much faster way.

Edit:

Here's the nested for loop solution:

function align_data(transaction_types) {
    var i, j, transaction, transactions;
    var timestamps = {};
    for (i = 0; i < transaction_types.length; i++) {
        transactions = transaction_types[i];
        for (j = 0; j < transactions.length; j++) {
            transaction = transactions[j];
            if (timestamps[transaction.date]) {
                timestamps[transaction.date].amounts.push(transaction.amount);
            } else {
                timestamps[transaction.date] = {
                    date: transaction.date,
                    amounts: [transaction.amount]
                };
            }
        }
    }

    var aligned = [];
    for (date in timestamps) {
        if (timestamps.hasOwnProperty(date)) {
            aligned.push(timestamps[date]);
        }
    }

    return aligned;
}
Steve D
  • 373
  • 2
  • 17
  • Merge all arrays, then use Quick Sort by the property `date string` (given that you don't need a Date object, which can be generated at a later time). Btw: Quick Sort is not more efficient that Insertion Sort when it comes to less records (for example in the range 256-512). Most probably you need [this](http://stackoverflow.com/questions/1129216/sort-array-of-objects-by-string-property-value-in-javascript) approach. – Vidul Feb 09 '16 at 02:19
  • What exactly is the code you tried for the "obvious" approach? There shouldn't be much object creation at all in it, so I'm guessing you did something unnecessary and/or inefficient in the inner loop, and fixing that could net enormous performance gains. That, or you're including the setup time of creating your input data as part of the test time. – Douglas Feb 09 '16 at 02:28
  • @RobG: The actual code doesn't generate the dates like this, I was trying to give something that could be played with. – Steve D Feb 09 '16 at 02:30
  • @Douglas: Sure, I'll edit that code into the question. – Steve D Feb 09 '16 at 02:31
  • Hmm. The structure of the code looks about like I thought it should. The potential performance problem that jumps out at me is the use of `transaction.date` as a map key. `transaction.date` is a `Date` object, right? Every time you use it as a key (once in the if and once in the assignment/push for each transaction) it calls `toString()` on it, which is significantly expensive. Try using `transaction.date.getTime()` instead. – Douglas Feb 09 '16 at 02:56
  • Switching `timestamps` to an array instead of object may also help, since the keys will be integer numbers with using `transaction.date.getTime()`. I think the switch to `getTime()` by itself would just change what `toString()` is invoked on, while according to my brief research an integer index in an array will actually skip `toString()` entirely. – Douglas Feb 09 '16 at 03:07

3 Answers3

1

I just tested it, running your code takes about 5-6 seconds on a random set of 1 million records on my computer, while the following took maybe half a second:

function align_data(transaction_types) {
    var i, j, transaction, transactions;
    var timestamps = {};
    for (i = 0; i < transaction_types.length; i++) {
        transactions = transaction_types[i];
        for (j = 0; j < transactions.length; j++) {
            transaction = transactions[j];
            if (timestamps[transaction.date.getTime()]) {
                timestamps[transaction.date.getTime()].amounts.push(transaction.amount);
            } else {
                timestamps[transaction.date.getTime()] = {
                    date: transaction.date,
                    amounts: [transaction.amount]
                };
            }
        }
    }

    var aligned = [];
    for (date in timestamps) {
        if (timestamps.hasOwnProperty(date)) {
            aligned.push(timestamps[date]);
        }
    }

    return aligned;
}

All I changed was indexing timestamps by transaction.date.getTime() instead of transaction.date.

Douglas
  • 5,017
  • 1
  • 14
  • 28
  • Using `getTime()` as an index didn't work for the code I was using, because of anomalies with the way JS treats arrays. *However*, using `Date.getTime().toString()` (instead of the implicit `Date.toString()` in my code above) sped things up about 100x. So thanks! – Steve D Feb 10 '16 at 01:53
0

I think that it is a situation, when you really should try to use WebWorkers.

Denys Konoplin
  • 362
  • 1
  • 9
0

How about using a single accumulator object across all your records, and then turning that into your preferred array format:

function align_data(transactions) {
    var acc = {};
    var d;
    for (var key in transactions) {
        transactions[key].forEach(function (record) { 
            d = record.date.toISOString();
            if (d in acc) {
                acc[d].push(record.amount);
            } else {
                acc[d] = [record.amount];
            }
        });
    }
    var aligned_transactions = [];
    for (var date in acc) {
        aligned_transactions.push({date: new Date(date), amounts: acc[date]});
    }
    return aligned_transactions;
}
jgfооt
  • 914
  • 6
  • 12