Update:
Datatable now has a cumcount function in dev :
data[:, [f.value, dt.cumcount()], 'grp']
| grp value C0
| str32 int32 int64
-- + ----- ----- -----
0 | a 2 0
1 | a 3 1
2 | b 1 0
3 | b 2 1
4 | b 5 2
5 | b 9 3
6 | c 2 0
[7 rows x 3 columns]
Old Answer:
datatable
does not have a cumulative count function, in fact there is no cumulative function for any aggregation at the moment.
One way to possibly improve the speed is to use a faster iteration of numpy, where the for loop is done within C, and with more efficiency. The code is from here and modified for this purpose:
from datatable import dt, f, by
import numpy as np
In [244]: def create_ranges(indices):
...: cum_length = indices.cumsum()
...: ids = np.ones(cum_length[-1], dtype=int)
...: ids[0] = 0
...: ids[cum_length[:-1]] = -1 * indices[:-1] + 1
...: return ids.cumsum()
counts = data[:, dt.count(), by('grp', add_columns=False)].to_numpy().ravel()
data[:, f[:].extend({"counts" : create_ranges(counts)})]
| grp value counts
| str32 int32 int64
-- + ----- ----- ------
0 | a 2 0
1 | a 3 1
2 | b 1 0
3 | b 2 1
4 | b 5 2
5 | b 9 3
6 | c 2 0
[7 rows x 3 columns]
The create_ranges function is wonderful (the logic built on cumsum is nice) and really kicks in as the array size increases.
Of course this has its drawbacks; you are stepping out of datatable into numpy territory and then back into datatable; the other aspect is that I am banking on the fact that the groups are sorted lexically; this won't work if the data is unsorted (and has to be sorted on the grouping column).
Preliminary tests show a marked improvement in speed; again it is limited in scope and it would be much easier/better if this was baked into the datatable library.
If you are good with C++, you could consider contributing this function to the library; I and so many others would appreciate your effort.
You could have a look at pypolars and see if it helps with your use case. From the h2o benchmarks it looks like a very fast tool.