-1

I need to iterate over a large set of records and count the number of distinct values in a particular field:

>> const cats = [
    {name: 'Sophie', color: 'black'},
    {name: 'Moo',    color: 'grey'},
    {name: 'Tipper', color: 'black'}
  ]

>> countBy(cats, 'color')
Map('black' => 2, 'grey' => 1) # (precise data structure unimportant)

Hey, that's what lodash countBy is for! But in my application, performance is paramount. In my (very informal) benchmark, this simple code beats lodash by around 50%:

function countBy<R extends object>(records: R[], key: keyof R) {
  const counts = new Map<keyof R, number>()
  for (const i = 0; i < records.length; i++) {
    const value = records[i][key]
    counts.set(value, (counts.get(value) || 0) + 1)
  }
}

(This is perhaps unsurprising since it doesn't need to handle fancier "iteratees" than keyof R.)

Question: is there any way to beat this algorithm, speed-wise?

Sasgorilla
  • 2,403
  • 2
  • 29
  • 56
  • 2
    What is this: `countBy`? That's not JavaScript. – Barmar Mar 30 '23 at 00:51
  • @Barmar typescript – Daniel A. White Mar 30 '23 at 00:51
  • 3
    Are you using JS or TS? Only tag the relevant language! – ikegami Mar 30 '23 at 00:59
  • 1
    I don't think this should be closed. OP is asking specifically for the _fastest_ way. That may depend on the browser engine, but it's not really a duplicate of the question it's marked duplicate of. – Slbox Mar 30 '23 at 01:10
  • Speed is never going to improve much with what you already have. The best is to make a change at the data provider side so the data structure is more in line with what you want to do with it. – trincot Mar 31 '23 at 07:19
  • This is a Javascript question, not a Typescript question. The types disappear at runtime and don't change the algorithm or implementation, but they help clarify what might be the difference between my version and lodash's. – Sasgorilla Apr 03 '23 at 14:45
  • fastest way on **what engine**? – starball Apr 06 '23 at 00:12
  • @Sasgorilla I added an answer. Please have a look, I hope it will work as per your expectation. – Debug Diva Apr 08 '23 at 04:58
  • Do you have enough data in this simple case in order to get good performance statistics? I would assume setup account for a very large part of the execution. Is your expected input data this small? It is impossible to suggest a "fastest algorithm" without good understanding of the expected data. – Automatico Apr 12 '23 at 12:47
  • most people use reduce `const counts = cats.reduce((acc, item) => (acc[item[key]] = (acc[item[key]] || 0) + 1, acc), {})` – epascarello Apr 12 '23 at 15:48

2 Answers2

3

There's not much you can do to avoid the O(n) nature of this. What you can do is:

  • test different variations of the loop (for, for-of, forEach, etc)
  • how you access the requested property
  • how you create new entries when you find a new prop

It will most likely depend on which browser(s) you need to support, or which Node.js/Deno/Bun version you use.

The TypeScript aspect of this won't matter too much in terms of performance, the same function signature can be used regardless of the implementation, but you could tweak it a bit in my mind – you don't really need to use Map:

function countBy<T extends { [K in keyof T]: T[K] }, K extends keyof T>(
  list: T[],
  prop: K
): { [key in T[K]]: number } {
  const acc = {} as { [key in T[K]]: number };
  for (let i = 0; i < list.length; i++) {
    const key = list[i][prop];
    acc[key] = (acc[key] ?? 0) + 1;
  }
  return acc;
}

Memory usage can possibly play a role, so one other thing to try is to alter the input to rely on some form of streaming mechanism. This, however, would require a different function signature and may introduce an asynchronous aspect.

Simon Ingeson
  • 951
  • 7
  • 10
2

It totally depends on the amount of data on which we are going to perform the operation. I think using Lodash or vanilla JS both are efficient way.

But after checking the performance in the JS performance benchmark tool, I found that Vanilla JS code is more performance efficient as compare to Lodash one.

So just for the demo and testing purpose, I used Array.reduce() method to count the color from the cats array.

const cats = [
  {name: 'Sophie', color: 'black'},
  {name: 'Moo',    color: 'grey'},
  {name: 'Tipper', color: 'black'}
];

function countBy(arr, prop) {
    return arr.reduce((total, current) => {
    total[current[prop]] = (total[current[prop]] || 0) + 1;
    return total;
  }, {})
}

const result = countBy(cats, 'color');

console.log(result);

And after compare it with the Lodash solution.

const cats = [
  {name: 'Sophie', color: 'black'},
  {name: 'Moo',    color: 'grey'},
  {name: 'Tipper', color: 'black'}
];

const result = _.countBy(cats, 'color')

console.log(result)
<script src="https://cdnjs.cloudflare.com/ajax/libs/lodash.js/4.17.11/lodash.js"></script>

I found that lodash is slower than Vanilla JS. Here is the screenshot from jsbench.

enter image description here

Debug Diva
  • 26,058
  • 13
  • 70
  • 123