3

I am trying to combine the output of many dynamic functions into one column.

I have come across this problem before in various forms.

Currently I am trying to take comma separated data and create a single column of all values.

data in column A, and desired output in C1

I can create the output in C1 with =VSTACK(TEXTSPLIT(A1,,","), TEXTSPLIT(A2,,","), TEXTSLIT....) but this is manual and does not allow for dynamic changing of data in A1 in length.

I am looking for a dynamic formula. I can't seem to get VSTACK and TEXTSPLIT to play nicely together dynamically.

JvdV
  • 70,606
  • 8
  • 39
  • 70
JimminyJim
  • 136
  • 10

5 Answers5

5

Building on discussions/concerns about speed/limits formed around other answers, maybe the whole issue with speed is the use of LAMBDA() which essentially iterates a given array. Maybe we can avoid this and the limits of TEXTJOIN() at the same time. My two cents to counter this:

enter image description here

Formula in B1:

=LET(x,A1:A4,TOCOL(TEXTAFTER(","&TEXTBEFORE(x&",",",",SEQUENCE(,MAX(LEN(x)-LEN(SUBSTITUTE(x,",",))+1)),,,NA()),",",-1),3))

EDIT: Did some benchmarking against 10000 rows with comma-seperated data == 'a,b,c', meaning splitting and stacking would result in 30000 rows of data. For this test I used this source-code where I'd tell excel to calculate manually and wait while Excel is busy:

Rows/Function MAKEARRAY() REDUCE() & VSTACK() TEXTBEFORE/AFTER()
10000 52.68 sec 34.82 sec 0.09 sec
100000 crash crash 0.59 sec

I tried benchmarking against 100000 rows which proven too much to handle for LAMBDA() related functions causing Excel to freeze/crash. It worked for the above mentioned function in 0.59 seconds which supprised me tbh.

I also found there is just a slight, almost unnoticable/neglectable, setback when simplifying the above to:

=LET(x,A1:A4,TOCOL(TEXTAFTER(","&TEXTBEFORE(x&",",",",SEQUENCE(,MAX(LEN(x))),,,NA()),",",-1),3))

Or even, as per @DavidLeal:

=LET(x,A1:A4,TOCOL(TEXTBEFORE(TEXTAFTER(","&x,",",SEQUENCE(,MAX(LEN(x))))&",",","),3))
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • 2
    Wonderful. Don't we always try to avoid using Lambdas? But we can't always find the right solution. But you can. About 2s for 1M results, 2 to 5 single characters. Thanks. I'll kick off. – VBasic2008 Mar 31 '23 at 07:55
  • 2
    @VBasic2008, thanks for testing this and the kind response. I've tried to do some benchmarking myself too. – JvdV Mar 31 '23 at 08:56
  • 1
    @JvdV Sir, it helps a lot when you put up so much for a query. – Mayukh Bhattacharya Mar 31 '23 at 13:57
  • 1
    @JvdV I was testing for several scenarios both formulas and I get the same result, would you share more details about the differences? Thanks – David Leal Mar 31 '23 at 21:52
  • 1
    Hi @DavidLeal, what do you mean with same results? Speedwise? – JvdV Mar 31 '23 at 21:55
  • 1
    @JvdV I haven't found any differences in terms of result from both formula. I tested several variation of the input sample data. My understanding from your comment is that the second formula may not work in all cases.I mean in terms of the output – David Leal Mar 31 '23 at 23:19
  • 1
    For me testing both lambda related answers on a 100000 rows of value a,b,c did result in freezing and eventually crashing. Not the case for you then ? @DavidLeal – JvdV Mar 31 '23 at 23:26
  • 1
    Sorry for the confusion @JvdV I meant both formulas in **your** answer and your comment: *I also found there is just a slight, almost unnoticable/neglectable, setback when simplifying the above to*. Which make me think the second formula doesn't work in some cases – David Leal Apr 01 '23 at 00:33
  • 1
    @DavidLeal. I could t find any mishaps (yet). Which let me to believe that `TOCOL()` must be taking the bulk of the little time it takes. Notice there really is just a difference of two characters on the sample data used when simplified formula is applied. I recon that when characters between commas are lengthier this will start to increase (a bit). – JvdV Apr 01 '23 at 06:24
  • 2
    Thanks @JvdV I was not able to find any differences in terms of characters returned. I am also wondering in which scenario matters the input argument `NA()` you put. I found out the following, produces the same result: `=LET(x,A1:A4,TOCOL(TEXTBEFORE(TEXTAFTER(","&x,",",SEQUENCE(,MAX(LEN(x))))&",",","),3))` it is sorter, because it avoids reverse search, so there is no need for an extra input argument. I uses first `TEXTAFTER` then `TEXTBEFORE`, the other way around. I would say the performance is the same – David Leal Apr 01 '23 at 14:57
  • 2
    @DavidLeal, seems to work fine indeed. I've not checked performance but good to hear you see near no difference. I'll edit this in. – JvdV Apr 03 '23 at 08:27
4

Perhaps try this way, and it would work for any number of rows:

enter image description here


• Formula used in cell C1

=DROP(REDUCE("",A1:A4,LAMBDA(x,y,VSTACK(x,TEXTSPLIT(y,,",")))),1)

Also, the formula credit wholly goes to JvdV Sir, without any doubt.


Here is the example query where I specifically came to know about it:

How to split texts from dynamic range?


Edit:


We can also wrap in one more function viz. TOCOL() to avoid the blank cells, I have specifically used 3 the optional parameter to [ignore] because it ignores blanks as well as any errors if any. Therefore the formula will be.

enter image description here


• Formula used in cell C1

=DROP(REDUCE("",TOCOL(A1:A4,3),LAMBDA(x,y,VSTACK(x,TEXTSPLIT(y,,",")))),1)

Mayukh Bhattacharya
  • 12,541
  • 5
  • 21
  • 32
  • 1
    Thanks for this. I've learnt a whole bunch of new tools from this answer - DROP, REDUCE, LAMBDAs are all new to me and will be very useful for other applications. I will wait and see if a less complex solution is provided (without LAMBDAs). – JimminyJim Mar 31 '23 at 01:28
  • Yes there was a solution posted but its deleted I see, now, however it depends upon the number of characters you will be having, and can be resolved using `TEXTJOIN()` & `TEXTSPLIT()` --> `TEXTJOIN()` has a character so its best to avoid, as we wont be working with just few rows data, but more. – Mayukh Bhattacharya Mar 31 '23 at 01:30
  • 1
    Could you please clarify what you mean by "TEXTJOIN() has a character so its best to avoid," – JimminyJim Mar 31 '23 at 01:33
  • Sorry, I meant to say `Character Limitations` --> i.e. `If the resulting string exceeds 32767 characters (cell limit), TEXTJOIN returns the #VALUE! error` this has been tried and tested and have been found, it really doesnt works beyond those character limits – Mayukh Bhattacharya Mar 31 '23 at 01:40
  • On the other hand, `REDUCE` with `VSTACK` has proven to be slow on a larger dataset so I always try to find a different solution. Try your formula with 10k values. On my machine, I have to wait 10s for the result, while my `TEXTJOIN` version spits it out in a split second: `=LET(Data,A2:A10001,TEXTSPLIT(TEXTJOIN(",",,TOCOL(Data,3)),,","))`. Just a reminder. – VBasic2008 Mar 31 '23 at 02:27
  • 1
    Sir, I agree it may be slow on a larger dataset, but since the query is related with `VSTACK()` & `TEXTSPLIT()` hence shared the solution, also if one working on a larger data set will naturally use either `VBA` or `Power Query`. But like i said, with 20K data `TEXTJOIN()` will return the error and `REDUCE()` & `VSTACK()` won't – Mayukh Bhattacharya Mar 31 '23 at 02:36
3

To make the formula dynamic, so you can add/subtract rows, try:

=DROP(REDUCE("",A1:A1000,LAMBDA(a,b,VSTACK(a,IFERROR(TEXTSPLIT(b,,","),"")))),1)    

where the cell reference is larger than what you might conceivably fill to.

enter image description here

However, this algorithm will take a while to run on large datasets (eg. 20,000+ rows) so you may want a Power Query solution instead. Let me know if that is the case.

Or, which would be my preference, have your range of strings in a Table and use a structured reference to refer to the range.

Then use a formula like:

=DROP(REDUCE("",Strings[CSS],LAMBDA(a,b,VSTACK(a,TEXTSPLIT(b,,",")))),1)

*(Note that so long as there are no empty cells in the range, you don't need the IFERROR function we had in the first example.

enter image description here

Ron Rosenfeld
  • 53,870
  • 7
  • 28
  • 60
3

Another user posted a solution but has removed it. I saw it before it was removed and think it was elegant and simple. I will remove my answer if the answerer reposts.

In C1 the formula

=TEXTSPLIT(TEXTJOIN(",",TRUE,A1:A4),,",")

works.

JvdV
  • 70,606
  • 8
  • 39
  • 70
JimminyJim
  • 136
  • 10
  • 3
    `TEXTJOIN()` with `TEXTSPLIT()` works but its best to avoid because `TEXTJOIN()` has character limit, kindly note that, also if the formula is short it doesnt means it will always gives to right output, most chances are they will render `FALSE POSITIVES` after certain number of rows. I could have shared the above solution but i have avoided, since i have specifically tested them with large numbers of data. – Mayukh Bhattacharya Mar 31 '23 at 01:32
  • 2
    Thankyou. I found this from Microsoft: If the resulting string exceeds 32767 characters (cell limit), TEXTJOIN returns the #VALUE! error. I will leave the answer here for reference for others, but have marked your answer as correct – JimminyJim Mar 31 '23 at 01:37
  • 1
    Sorry, I am dealyed in responding, I was explaining someone the same issues with `TEXTJOIN()` in some other forum, they were using `20000` characters and found no issue, but when they used beyond `32767` they started to get the error. – Mayukh Bhattacharya Mar 31 '23 at 01:42
3

I haven't done any performance testing, though if, as suggested by some, it turns that out that the REDUCE/VSTACK recursive set-up is too slow over large datasets, it might be worth considering an alternative using MAKEARRAY:

=LET(
    ζ,A1:A4,
    TOCOL(
        MAKEARRAY(
            ROWS(ζ),
            MAX(LEN(ζ)-LEN(SUBSTITUTE(ζ,",",""))+1),
            LAMBDA(α,β,INDEX(TEXTSPLIT(INDEX(ζ,α,1),",",,1),β))
        ),
        2
    )
)
Jos Woolley
  • 8,564
  • 2
  • 4
  • 9