3

I'm trying to create a data set to upload into tableau, and am essentially trying to have the numbers and the words in the top left here-all multiplied (copied) into one column. So essentially Tan is 10, so I need it listed out on ten rows for Column L. Same thing for 'Grey' and so on. I've been trying different functions but things don't seem to be working. I've tried the repeat function but it only lists things horizontally and fully concatenated instead of one by one vertically and ALL within the same column.

enter image description here

Would anyone know a way I could this in excel so I can essentially automate this without having to go and copy things ONE by one?

I tried to use the repeat function and it wouldn't work since it produced things horizontally and jumbled up instead of one by one. I can do this all manually but I essentially have to do this about 50 times, so any type of formula or automation would help and I've been scouring the internet and ChatGPT to configure something but no such luck yet.

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
  • Is this a one time or a recurring task? – cybernetic.nomad Mar 27 '23 at 21:21
  • 1
    I think it has been solved many times here using `Excel Formulas`, `VBA` & ofcourse using `Power Query`, have you tried searching `StackOverflow`? Try: `=LET(_data,TRANSPOSE(A1:J2), num_repeat,DROP(_data,,1), values,TAKE(_data,,1), XLOOKUP(SEQUENCE(SUM(num_repeat)), SCAN(0,num_repeat,LAMBDA(a,b,a+b)), values,,1))` – Mayukh Bhattacharya Mar 27 '23 at 22:07

2 Answers2

2

Repeat in Column From Row

  • It looks similar to Mayukh Bhattacharya's solution in the comments but it is many, many times faster on a large data set. I tried it with over 1M records using the full rows 1:1 and 2:2 with the formula =RANDBETWEEN(1,125) (in row 2; don't forget to copy/paste values) and it finished in a bit more than one second.
=LET(d,A1:D1,r,A2:D2,
TOCOL(INDEX(d,,MATCH(SEQUENCE(,SUM(r)),
    HSTACK(1,SCAN(1,r,LAMBDA(a,b,a+b))),1))))

enter image description here

VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • Nice solution, it even works in the weird case the that some repetition value is zero. The only comment is that the third `MATCH` argument is not really necessary, because it is the default value. – David Leal Mar 28 '23 at 13:26
  • 1
    Thanks for your input. That's a nice catch, totally forgot about it. What do you think about the `SCAN` sequence i.e. the `11` is redundant (inaccurate)? It works either way. Keep it, or drop it? – VBasic2008 Mar 28 '23 at 13:37
  • 1
    `SCAN` produces the sequence `3, 6, 10, 11`. Then I stack `1` to the left so `MATCH` can work correctly. The 11 though is redundant. Usually, when you stack to one side, you drop on the other. BTW, `XMATCH` (or `XLOOKUP`; see Mayukh's solution in OP's comments) works without the stacked `1` but with the `1` parameter and the initial `SCAN` value `0`. But unfortunately, it's too slow. Note that both functions (`XMATCH` and `XLOOKUP`) work perfectly well with exact matches but there is a horrible lag when working with approximate matches in large datasets. – VBasic2008 Mar 28 '23 at 14:24
  • I see your point, `SCAN` generates a sequence identifying the beginning of the interval for each, so it is true `11` is not required, but this is part of the recurrence. I guess there is a way to avoid it, instead of identifying the start, the end is as follows: `=LET(d,A1:D1,r,A2:D2,TOCOL(INDEX(d,, XMATCH(SEQUENCE(,SUM(r)), SCAN(0,r,LAMBDA(a,b,a+b)),1))))` but I doesn't work with `MATCH` and you prefer to use `MATCH` over `XMATCH`. Under this approach, you don't need `HSTACK`, so it is shorter. I tested your example and mine. Please let me know. Thanks – David Leal Mar 28 '23 at 15:10
  • In this case, the output of `SCAN` is already sorted in ascending order, maybe it is more efficient. Have you tried? Maybe use the **search_mode** input argument and set it to `2` to do a binary search. – David Leal Mar 28 '23 at 15:23
  • I tested it with 16K columns and both solutions (using `MATCH` and `XMATCH`) work pretty fast (immediate), but my solution using `VSTACK/REDUCE` is slow. The output is about 90K rows. – David Leal Mar 28 '23 at 16:56
  • 1
    @DavidLeal Try with `=RANDBETWEEN(1, 125)` in row `2` to be able to generate more than 1M records (copy/paste values). Test only the `MATCH` version, to prove that it is fast for the whole row. Then test `XMATCH` using e.g. only columns `A:BBB`. It should take about 10s. – VBasic2008 Mar 28 '23 at 19:54
  • true, I tested on my end, but have you tested setting `search_mode=2` for `XMATCH`? for an output of `1M` rows, the performance for both of them is the same, a fraction of a second. I tested for your scenario. For the first row: `BYCOL(1:1, LAMBDA(x, RANDBETWEEN(1,1000)))` and for the second row: `BYCOL(1:1, LAMBDA(x, RANDBETWEEN(1,125)))` and then copy and paste the values. I am using the following formula for `XMATCH`: `=LET(d,1:1,r,2:2,TOCOL(INDEX(d,,XMATCH(SEQUENCE(,SUM(r)), SCAN(0,r,LAMBDA(a,b,a+b)),1,2))))` **That would solve the mystery, :-), same performance, but shorter** – David Leal Mar 28 '23 at 20:41
  • 1
    @DavidLeal Tested it right now. You're right. What does this parameter actually do (in a practical sense)? – VBasic2008 Mar 28 '23 at 20:54
  • [Binary search](https://en.wikipedia.org/wiki/Binary_search_algorithm), which is faster than any other kind of search, but the input **lookup_array** needs to be sorted for [XMATCH](https://support.microsoft.com/en-us/office/xmatch-function-d966da31-7a6b-4a13-a1c6-5a33ed6a0312) requirement and binary search requirement too, which is our case. It is `O(log n)` – David Leal Mar 28 '23 at 21:22
1

If you don't have a large input, that would reach the limit of the number of cell characters: 32,767 after TEXTJOIN call. You can try the following:

=TOCOL(TEXTSPLIT(TEXTJOIN(";",,REPT(A1:B1&",",A2:B2)),",",";",1),2)

Otherwise, you can try the following:

=LET(in, A1:B2,DROP(REDUCE("", SEQUENCE(COLUMNS(in)), LAMBDA(ac,x,
 VSTACK(ac, REPT(INDEX(in,1,x), SEQUENCE(INDEX(in,2,x),,1,0))))),1))

Here is a sample output for the second approach: output

Adjust the input (in) to your specific range. It uses DROP/REDUCE/VSTACK pattern, check my answer to the following question: how to transform a table in Excel from vertical to horizontal but with different length

Update

Even though the second formula doesn't have the limitation of the first one, it shows a slow performance (my guess iterating over VSTACK is not fast enough, vote for the following suggestion to Microsoft if you like) compared to the answer provided by @VBasic2008 for huge data input of about 16K columns generating an output of more than 1M rows (check the comment section of his answer for more information). In such cases, it is better to use his approach or the following variation if you want to use XMATCH setting the fourth input argument of XMATCH: search_mode to 2 (binary search). It can be applied because the input argument: lookup_array is sorted in ascending order by the way it is generated by SCAN. Check XMATCH documentation for more information.

=LET(d,A1:B1,r,A2:B2,TOCOL(INDEX(d,,XMATCH(SEQUENCE(,SUM(r)),
 SCAN(0,r,LAMBDA(a,b,a+b)),1,2))))
David Leal
  • 6,373
  • 4
  • 29
  • 56