How to create an efficient group by function without mutation?

Question

Is there a way to efficiently implement a group by function without mutation?

Naive implementation:

ReScript playground
JavaScript (see below)

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

var messagesGroupedByDate = messages.reduce(function (data, message) {
  if (
    data.some(function (point) {
      return point.date === message.insertedAt;
    })
  ) {
    return data.map(function (point) {
      if (point.date === message.insertedAt) {
        return {
          date: point.date,
          count: (point.count + 1) | 0,
        };
      } else {
        return point;
      }
    });
  } else {
    return data.concat([
      {
        date: message.insertedAt,
        count: 1,
      },
    ]);
  }
}, []);


console.log(messagesGroupedByDate);

For the sake of argument, there's no need to make this more generic. The problem I'm facing is that I'm looping three times:

once with Array.prototype.reduce which is necessary to loop over messages
once with Array.prototype.some to see if the date key already exists in the resulting array
in the case where a date key already exists, we loop again with Array.prototype.map to update a specific element of an array
otherwise, a new array is returned that contains the new element

If there's not really any good way to make this efficient in ReScript, then I can always use raw JavaScript for this function, but I'm curious if it's possible to do this efficiently without mutation.

Nick · Answer 1 · 2021-01-18T23:46:41.610

1

You can group more simply by building an object of counts by date, and then using Object.entries and Array.map to convert that to an array of objects as required:

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
];

var messagesGroupedByDate = Object.entries(
  messages.reduce((data, message) => {
    data[message.insertedAt] = data[message.insertedAt] || 0;
    data[message.insertedAt]++;
    return data;
  }, {})
).map(([date, count]) => ({ date, count }));

console.log(messagesGroupedByDate);

You can also create an object directly from the values in messages and then update the counts by looping over messages:

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

const data = Object.fromEntries(messages.map(({ insertedAt: date }) => [ date, 0 ]));

messages.forEach(({ insertedAt: date }) => data[date]++);

const messagesGroupedByDate = Object.entries(data).map(([date, count])=> ({date, count}));

console.log(messagesGroupedByDate);

edited Jan 18 '21 at 23:46

answered Jan 18 '21 at 23:01

Nick

138,499
22
57
95

Thank you for your answer. This answer is cool, but it does use mutation in the reducer – Raphael Rafatpanah Jan 18 '21 at 23:17
The reason this is important is because the data structures I'm working with are immutable – Raphael Rafatpanah Jan 18 '21 at 23:17
@RaphaelRafatpanah - I'm not sure I understand how this is different to the way you manipulate the `data` array in your code? – Nick Jan 18 '21 at 23:19
For example, this line: `data[message.insertedAt] = data[message.insertedAt] || 0;` is mutating the original value of `data[message.insertedAt]` and this is not something that I can do in ReScript since `data` is immutable – Raphael Rafatpanah Jan 18 '21 at 23:20
@RaphaelRafatpanah it's not mutating the value of `data[message.insertedAt]` (it just creates that entry in the `data` object if it doesn't exist) but I'll agree that it is mutating the value of `data`. – Nick Jan 18 '21 at 23:25
Yes, I agree with that. Thanks for correcting – Raphael Rafatpanah Jan 18 '21 at 23:28
1

@RaphaelRafatpanah I've added an alternate solution. I'm curious as to whether that meets your requirements? As I understand it (admittedly from a quick read of the [manual](https://rescript-lang.org/docs/manual/latest/record)) object property values *can* be modified, so `data[date]++` should be fine. – Nick Jan 18 '21 at 23:47
Thank you. If I'm not mistaken, `data[date]++` is still reassigning and therefore mutating. Note, the `++` part can be avoided, but the `data[date] = ...` part is the concern – Raphael Rafatpanah Jan 18 '21 at 23:53
Please excuse the reach, but if you're curious to learn more about this way of programming, I'd recommend this 5-week Coursera course: https://www.coursera.org/learn/programming-languages - it was really, really fun in my opinion – Raphael Rafatpanah Jan 18 '21 at 23:55
@RaphaelRafatpanah they seem to do just that addition in this example in the [manual](https://rescript-lang.org/docs/manual/latest/record#mutable-update) where they set the value of an object's property to the value + 1 (so you could rewrite as `data[date] = data[date] + 1`) – Nick Jan 18 '21 at 23:57
1

@RaphaelRafatpanah looks like an interesting course... – Nick Jan 18 '21 at 23:58
Right, but in the example that you're referring to, the data structure is allowing mutation explicitly, and I'm specifically wondering if there is a way to be efficient while disallowing mutation. The accepted answer seems to prove that a more efficient way is indeed possible – Raphael Rafatpanah Jan 19 '21 at 00:01
1

@RaphaelRafatpanah I understand. I'm not 100% convinced that functions with side-effects are not doing mutation, but we could probably argue that for years :) thanks - i't s been an interesting exercise and discussion. Now unfortunately I need to go to some real work :) but first I must just upvote... – Nick Jan 19 '21 at 00:03
Thank you. To be clear, what I'm really asking for is this: How to do an efficient group by function in a way that the ReScript compiler (aka OCaml compiler) is happy. And the accepted answer's way of doing it makes the compiler happy – Raphael Rafatpanah Jan 19 '21 at 00:06

score 1 · Accepted Answer · answered Jan 18 '21 at 23:19

Just add data to a Map() and then convert to array, then to object. It doesn't mutate anything as per your request.

We may simplify this even more but it is 5:00 AM right now and my brain is asleep now.

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
];

const mapped = new Map();

messages.forEach(message => {
    // if date already seen before, increment the count
    if (mapped.has(message.insertedAt)) {
        const count = mapped.get(message.insertedAt);
        mapped.set(message.insertedAt, count+1);
    } else {
        // date never seen before, add to map with initial count
        mapped.set(message.insertedAt, 1);
    }
});

const msgArr = Array.from(mapped);

const final = msgArr.map(([date, count])=> ({date, count}));

console.log(final);

@RaphaelRafatpanah doesn't this mutate the value of `mapped`? — Nick, Jan 18 '21 at 23:26
@Nick, sure you can argue that `mapped` is being mutated as a result of `mapped.set`, however, this approach does work with immutable data structures since `mapped.set` is a function call that happens to perform a side effect — Raphael Rafatpanah, Jan 18 '21 at 23:32
@Nick maybe a clearer requirement would be a group by function that does not reassign any values directly — Raphael Rafatpanah, Jan 18 '21 at 23:40

customcommander · Answer 3 · 2021-01-18T23:44:38.293

1

You need a map as an intermediate data structure:

{"2021-01-18": 2, /*…*/}

Then you deconstruct it into pairs and remap the pairs into objects:

const count =
  xs =>
    Object.entries(
      xs.reduce((acc, {insertedAt: k}) =>
        (acc[k] = (acc[k] ?? 0) + 1, acc), {}))
          .map(([k, v]) => ({date: k, count: v}));

// similar to count above but 100% pure
const count_pure =
  xs =>
    Object.entries(
      xs.reduce((acc, {insertedAt: k}) =>
        ({...acc, [k]: (acc[k] ?? 0) + 1}), {}))
          .map(([k, v]) => ({date: k, count: v}));


console.log(count(messages));
console.log(count_pure(messages));

<script>
var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]
</script>

It is common when doing this to use the spread operator … but depending on how many items you need to process, this may very quickly turn out to be inefficient. See gist I wrote on the subject https://gist.github.com/customcommander/97eb4b3f1600773db59406d39f3f9cd7

edited Jan 18 '21 at 23:44

answered Jan 18 '21 at 23:25

customcommander

17,580
5
58
84

It looks like the reducer here uses mutation – Raphael Rafatpanah Jan 18 '21 at 23:37
@RaphaelRafatpanah It mutates the reducer `acc` parameter which was set to an empty object to start with. So technically it does but it is a safe mutation to make. This is similar to @Nick answer as far as I can tell, with just a different flavour. – customcommander Jan 18 '21 at 23:39
Agreed. However, in my case, I'm using a language similar to Haskell / Clojure in that does it not allow this kind of mutation, hence the restriction – Raphael Rafatpanah Jan 18 '21 at 23:42
@RaphaelRafatpanah Added a 100% pure alternative. See updated answer. – customcommander Jan 18 '21 at 23:45
If I'm not mistaken `count_pure` still isn't pure since here: `({...acc, [k]: (acc[k] ?? 0) + 1}), {}))` we are reassigning the value of the element with key `k` – Raphael Rafatpanah Jan 18 '21 at 23:46
@RaphaelRafatpanah Nope, an entire new `acc` object is recreated on the fly at each iteration. – customcommander Jan 18 '21 at 23:47
Oh right, good call. Thanks for the answer. I'm still going to go with the currently accepted answer, however, since that's a bit more efficient – Raphael Rafatpanah Jan 18 '21 at 23:49

Praveenkumar · Answer 4 · 2021-05-31T06:59:39.723

Though the question is little old, i thought i will share my code for future readers. The below code is written in rescript and completely immutable because I have used Immutable Map from rescript.

type message = {insertedAt: string}

let messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

// Map Values
// Reduce into an immutable map
// Convert to tuple Array
// Log it

messages
->Belt.Array.reduce(Belt.Map.String.empty, (m, v) =>
  // Here Every set is creating a new map
  m->Belt.Map.String.set(v.insertedAt, m->Belt.Map.String.getWithDefault(v.insertedAt, 0) + 1)
)
->Belt.Map.String.toArray
->Js.log

Run In Rescript Playground. More on Immutable Map in rescript here.

Output:

[ [ '2021-01-10', 1 ],
  [ '2021-01-12', 1 ],
  [ '2021-01-13', 3 ],
  [ '2021-01-14', 1 ],
  [ '2021-01-15', 2 ],
  [ '2021-01-16', 1 ],
  [ '2021-01-17', 3 ],
  [ '2021-01-18', 2 ] ]

Cool, I wasn't aware of `Belt.MapString` and it seems useful for this use case. However, iterating over the array three times seems less efficient than in the accepted answer, so I'll keep that one. +1 for using ReScript though! — Raphael Rafatpanah, May 31 '21 at 00:07
@RaphaelRafatpanah I have reduced the iteration to 2. Actually time complexity is O(2n). O(2n) = O(n). Check this, https://stackoverflow.com/questions/25777714/which-algorithm-is-faster-on-or-o2n. — Praveenkumar, May 31 '21 at 07:01

How to create an efficient group by function without mutation?

4 Answers4