4

Is there a way to efficiently implement a group by function without mutation?

Naive implementation:

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

var messagesGroupedByDate = messages.reduce(function (data, message) {
  if (
    data.some(function (point) {
      return point.date === message.insertedAt;
    })
  ) {
    return data.map(function (point) {
      if (point.date === message.insertedAt) {
        return {
          date: point.date,
          count: (point.count + 1) | 0,
        };
      } else {
        return point;
      }
    });
  } else {
    return data.concat([
      {
        date: message.insertedAt,
        count: 1,
      },
    ]);
  }
}, []);


console.log(messagesGroupedByDate);

For the sake of argument, there's no need to make this more generic. The problem I'm facing is that I'm looping three times:

  • once with Array.prototype.reduce which is necessary to loop over messages
  • once with Array.prototype.some to see if the date key already exists in the resulting array
  • in the case where a date key already exists, we loop again with Array.prototype.map to update a specific element of an array
  • otherwise, a new array is returned that contains the new element

If there's not really any good way to make this efficient in ReScript, then I can always use raw JavaScript for this function, but I'm curious if it's possible to do this efficiently without mutation.

Lhooq
  • 4,281
  • 1
  • 18
  • 37
Raphael Rafatpanah
  • 19,082
  • 25
  • 92
  • 158

4 Answers4

1

You can group more simply by building an object of counts by date, and then using Object.entries and Array.map to convert that to an array of objects as required:

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
];

var messagesGroupedByDate = Object.entries(
  messages.reduce((data, message) => {
    data[message.insertedAt] = data[message.insertedAt] || 0;
    data[message.insertedAt]++;
    return data;
  }, {})
).map(([date, count]) => ({ date, count }));

console.log(messagesGroupedByDate);

You can also create an object directly from the values in messages and then update the counts by looping over messages:

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

const data = Object.fromEntries(messages.map(({ insertedAt: date }) => [ date, 0 ]));

messages.forEach(({ insertedAt: date }) => data[date]++);

const messagesGroupedByDate = Object.entries(data).map(([date, count])=> ({date, count}));

console.log(messagesGroupedByDate);
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thank you for your answer. This answer is cool, but it does use mutation in the reducer – Raphael Rafatpanah Jan 18 '21 at 23:17
  • The reason this is important is because the data structures I'm working with are immutable – Raphael Rafatpanah Jan 18 '21 at 23:17
  • @RaphaelRafatpanah - I'm not sure I understand how this is different to the way you manipulate the `data` array in your code? – Nick Jan 18 '21 at 23:19
  • For example, this line: `data[message.insertedAt] = data[message.insertedAt] || 0;` is mutating the original value of `data[message.insertedAt]` and this is not something that I can do in ReScript since `data` is immutable – Raphael Rafatpanah Jan 18 '21 at 23:20
  • @RaphaelRafatpanah it's not mutating the value of `data[message.insertedAt]` (it just creates that entry in the `data` object if it doesn't exist) but I'll agree that it is mutating the value of `data`. – Nick Jan 18 '21 at 23:25
  • Yes, I agree with that. Thanks for correcting – Raphael Rafatpanah Jan 18 '21 at 23:28
  • 1
    @RaphaelRafatpanah I've added an alternate solution. I'm curious as to whether that meets your requirements? As I understand it (admittedly from a quick read of the [manual](https://rescript-lang.org/docs/manual/latest/record)) object property values *can* be modified, so `data[date]++` should be fine. – Nick Jan 18 '21 at 23:47
  • Thank you. If I'm not mistaken, `data[date]++` is still reassigning and therefore mutating. Note, the `++` part can be avoided, but the `data[date] = ...` part is the concern – Raphael Rafatpanah Jan 18 '21 at 23:53
  • Please excuse the reach, but if you're curious to learn more about this way of programming, I'd recommend this 5-week Coursera course: https://www.coursera.org/learn/programming-languages - it was really, really fun in my opinion – Raphael Rafatpanah Jan 18 '21 at 23:55
  • @RaphaelRafatpanah they seem to do just that addition in this example in the [manual](https://rescript-lang.org/docs/manual/latest/record#mutable-update) where they set the value of an object's property to the value + 1 (so you could rewrite as `data[date] = data[date] + 1`) – Nick Jan 18 '21 at 23:57
  • 1
    @RaphaelRafatpanah looks like an interesting course... – Nick Jan 18 '21 at 23:58
  • Right, but in the example that you're referring to, the data structure is allowing mutation explicitly, and I'm specifically wondering if there is a way to be efficient while disallowing mutation. The accepted answer seems to prove that a more efficient way is indeed possible – Raphael Rafatpanah Jan 19 '21 at 00:01
  • 1
    @RaphaelRafatpanah I understand. I'm not 100% convinced that functions with side-effects are not doing mutation, but we could probably argue that for years :) thanks - i't s been an interesting exercise and discussion. Now unfortunately I need to go to some real work :) but first I must just upvote... – Nick Jan 19 '21 at 00:03
  • Thank you. To be clear, what I'm really asking for is this: How to do an efficient group by function in a way that the ReScript compiler (aka OCaml compiler) is happy. And the accepted answer's way of doing it makes the compiler happy – Raphael Rafatpanah Jan 19 '21 at 00:06
1

Just add data to a Map() and then convert to array, then to object. It doesn't mutate anything as per your request.

We may simplify this even more but it is 5:00 AM right now and my brain is asleep now.

var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
];

const mapped = new Map();

messages.forEach(message => {
    // if date already seen before, increment the count
    if (mapped.has(message.insertedAt)) {
        const count = mapped.get(message.insertedAt);
        mapped.set(message.insertedAt, count+1);
    } else {
        // date never seen before, add to map with initial count
        mapped.set(message.insertedAt, 1);
    }
});

const msgArr = Array.from(mapped);

const final = msgArr.map(([date, count])=> ({date, count}));

console.log(final);
rahulpsd18
  • 641
  • 4
  • 12
  • @RaphaelRafatpanah doesn't this mutate the value of `mapped`? – Nick Jan 18 '21 at 23:26
  • @Nick, sure you can argue that `mapped` is being mutated as a result of `mapped.set`, however, this approach does work with immutable data structures since `mapped.set` is a function call that happens to perform a side effect – Raphael Rafatpanah Jan 18 '21 at 23:32
  • @Nick maybe a clearer requirement would be a group by function that does not reassign any values directly – Raphael Rafatpanah Jan 18 '21 at 23:40
1

You need a map as an intermediate data structure:

{"2021-01-18": 2, /*…*/}

Then you deconstruct it into pairs and remap the pairs into objects:

const count =
  xs =>
    Object.entries(
      xs.reduce((acc, {insertedAt: k}) =>
        (acc[k] = (acc[k] ?? 0) + 1, acc), {}))
          .map(([k, v]) => ({date: k, count: v}));

// similar to count above but 100% pure
const count_pure =
  xs =>
    Object.entries(
      xs.reduce((acc, {insertedAt: k}) =>
        ({...acc, [k]: (acc[k] ?? 0) + 1}), {}))
          .map(([k, v]) => ({date: k, count: v}));


console.log(count(messages));
console.log(count_pure(messages));
<script>
var messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]
</script>

It is common when doing this to use the spread operator but depending on how many items you need to process, this may very quickly turn out to be inefficient. See gist I wrote on the subject https://gist.github.com/customcommander/97eb4b3f1600773db59406d39f3f9cd7

customcommander
  • 17,580
  • 5
  • 58
  • 84
1

Though the question is little old, i thought i will share my code for future readers. The below code is written in rescript and completely immutable because I have used Immutable Map from rescript.

type message = {insertedAt: string}

let messages = [
  {insertedAt: "2021-01-10"},
  {insertedAt: "2021-01-12"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-13"},
  {insertedAt: "2021-01-14"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-15"},
  {insertedAt: "2021-01-16"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-17"},
  {insertedAt: "2021-01-18"},
  {insertedAt: "2021-01-18"},
]

// Map Values
// Reduce into an immutable map
// Convert to tuple Array
// Log it

messages
->Belt.Array.reduce(Belt.Map.String.empty, (m, v) =>
  // Here Every set is creating a new map
  m->Belt.Map.String.set(v.insertedAt, m->Belt.Map.String.getWithDefault(v.insertedAt, 0) + 1)
)
->Belt.Map.String.toArray
->Js.log

Run In Rescript Playground. More on Immutable Map in rescript here.

Output:

[ [ '2021-01-10', 1 ],
  [ '2021-01-12', 1 ],
  [ '2021-01-13', 3 ],
  [ '2021-01-14', 1 ],
  [ '2021-01-15', 2 ],
  [ '2021-01-16', 1 ],
  [ '2021-01-17', 3 ],
  [ '2021-01-18', 2 ] ]
Praveenkumar
  • 2,056
  • 1
  • 9
  • 18
  • Cool, I wasn't aware of `Belt.MapString` and it seems useful for this use case. However, iterating over the array three times seems less efficient than in the accepted answer, so I'll keep that one. +1 for using ReScript though! – Raphael Rafatpanah May 31 '21 at 00:07
  • 1
    @RaphaelRafatpanah I have reduced the iteration to 2. Actually time complexity is O(2n). O(2n) = O(n). Check this, https://stackoverflow.com/questions/25777714/which-algorithm-is-faster-on-or-o2n. – Praveenkumar May 31 '21 at 07:01