5

I'm trying to find an alternative to my formula that calculates the running count of multiple values in a range.

It's returning the correct results, but the issue is it uses the volatile function OFFSET(), which I would like to avoid.

Here is the formula:

=SCAN(0,A2:A13,
LAMBDA(a,b,
IF(OFFSET(b,-1,0)=b,
a+1,1)
)
)

I have tried using INDEX() and ROW(), but I cannot replicate what OFFSET() does.

How can I alter the formula?

enter image description here

Statto
  • 410
  • 3
  • 9
  • It's not clear to me how you want to calculate the value (i.e. what does A or B equal ?) – ApplePie Mar 17 '23 at 22:53
  • 3
    Probably something like: `=SCAN(0,A1:A12=A2:A13,LAMBDA(a,b,IF(b,a+1,1)))` but not behind a pc to check if correct. Btw, this is a running count that would reset if the value changes for those wondering. – JvdV Mar 17 '23 at 23:15
  • 1
    See this https://stackoverflow.com/questions/74389341/running-count-is-slow-in-google-sheets for an entertaining discussion of this area in Google Sheets which should also be applicable to Excel including case where you don't want to reset e.g. in row 7 where A is repeated after an intervening block of B's. – Tom Sharpe Mar 18 '23 at 09:27

3 Answers3

5

Running Count

  • The trick is to compare the column with a version of the column shifted one row down. I'm using VSTACK with "" to add a new (empty) first row and DROP with -1 to get rid of the last row.
=LET(Data,A2:A13,
SCAN(0,Data=DROP(VSTACK("",Data),-1),
    LAMBDA(a,b,IF(b,a+1,1))))
VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • That works, but does it matter what the first row is? Why did you use `!|!`? – Statto Mar 18 '23 at 00:03
  • To ensure non-equality i.e. to get 1 in the first row of the result. Being a little bit paranoid. Just use something that can't be in the first row of the data. – VBasic2008 Mar 18 '23 at 00:04
  • I have done a few tests, and it doesn't seem to matter what it is. I've just put `""` and the results are the same. – Statto Mar 18 '23 at 00:12
  • 1
    Yeah, sorry, I'm stupid. It's either 0+1 or 1 for the first row, so you can use anything. – VBasic2008 Mar 18 '23 at 00:14
4

In short, you'd need a way to iterate an array of truthy/falsey values that would indicate the Previous value == Current value. If you are working with cell-references this is easy:

=SCAN(0,A1:A12=A2:A13,LAMBDA(a,b,IF(b,a+1,1)))

You'll end up with a running count of consecutive values that would reset accordingly.

For good measure; you can avoid the use of INDIRECT() when using SCAN(). Refer to each element's respective row/column if need be as per this example.

JvdV
  • 70,606
  • 8
  • 39
  • 70
  • My only issue with that formula is the referenced ranges `A1:A12=A2:A13` are not the same. This would be a minor inconvenience when referencing a table column, for instance. – Statto Mar 17 '23 at 23:44
  • You were literally using cell references in your sample, hence the answer @Statto. The other answer got the array alternative down if need be. – JvdV Mar 18 '23 at 00:03
4

Another way:

=LET(r,A2:A13,
SCAN(1,SEQUENCE(ROWS(r)),LAMBDA(x,y,IF(y=1,1,IF(INDEX(r,y)=INDEX(r,y-1),x+1,1)))))

While this is a very dynamic solution (you could even use r,FILTER(A:A,A:A<>"")), when used in large data sets this calculates slow (As VBasix2008 pointed out in the comments).

If the range is repeated within the INDEX parts of the formula (as suggested by Tom Sharpe) it is calculates fast again:

=LET(r,A2:A13,
SCAN(1,SEQUENCE(ROWS(r)),LAMBDA(x,y,IF(y=1,1,IF(INDEX(A2:A13,y)=INDEX(A2:A13,y-1),x+1,1)))))

It seams that accessing the array in memory is significantly slower than accessing the actual range within INDEX

P.b
  • 8,293
  • 2
  • 10
  • 25
  • 3
    This seems the cleanest way to me, although the formula may be a bit longer than others. – Tom Sharpe Mar 18 '23 at 09:31
  • 1
    The idea is that you only need to input the range once – P.b Mar 18 '23 at 09:49
  • 1
    Clean or not, it expectedly takes forever on a large data set. It takes 30s for 10K records, while the two previously posted solutions take less than 2s for 1M records. – VBasic2008 Mar 18 '23 at 10:03
  • 1
    It's only a couple of seconds or so for me, but I agree it's appreciably slower than the other two methods. I guess that can only be because of the use of index, but I'm surprised that adds so much to the time. In the answer commented above, the same method seemed to do pretty well in Google Sheets. – Tom Sharpe Mar 18 '23 at 10:12
  • @TomSharpe I think the reason is `SCAN`. I've had no problems with `INDEX` so far. I was lately shedding tears while using `REDUCE` with `VSTACK`. So I'm definitely keeping away from those two (`SCAN`, `REDUCE`) whenever possible. But they do wonders though. BTW, my sample data was `=CHAR(RANDBETWEEN(65,66))` in column `A` (don't forget to copy/paste values). I mean, there's a huge difference between 30s and a couple of seconds. – VBasic2008 Mar 18 '23 at 10:30
  • 1
    Hm. =LET(r,A1:A10000, SCAN(1,SEQUENCE(ROWS(r)),LAMBDA(x,y,IF(y=1,1,IF(INDEX(A1:A10000,y)=INDEX(A1:A10000,y-1),x+1,1))))) is much faster. – Tom Sharpe Mar 18 '23 at 10:44
  • 1
    @TomSharpe It takes 3s for 1M of my sample data. How to explain this? It looks like you were right. – VBasic2008 Mar 18 '23 at 11:01
  • 2
    interesting. I didn't test it on a large set when posted, but trying both my version and the one @TomSharpe suggested, makes a huge difference. I added this to my answer with a possible explanation why, though I'm not sure that's causing the difference. In the end it does not make sense to me. – P.b Mar 18 '23 at 16:15
  • 1
    Yes, it seems that, when passing a *range* defined within the `LET` function as the first parameter to `INDEX`, that range is resolved to its equivalent *array*. I assume multiple repetitions of such a resolution are costly in terms of performance. – Jos Woolley Mar 19 '23 at 07:23