1

Each array is listed horizontally, has n columns and a variable number of rows. Every array has the same n number of columns. There is a blank column between each array.

enter image description here

They can be combined using vstack(array1,array2,etc...) but this requires knowing how many arrays there are. Is there a way to vstack all of the arrays where the number of arrays is determined by whether there is a column heading populated in the first column of each array?

So if cell A1 (column heading) is not blank, there is an array in A:C. If E1 is not blank, then there is an array in E:G, etc... Since M1 is empty in the above example data, there are no more arrays. The number of rows varies with each array but is not more than 1000. The data within the array could be anything including some empty cells. However each row of data has at least one populated cell.

Replacing the n/a and errors with blanks is OK too.

Using only an excel formula, no VBA please. Possible solutions could be reduce() or a recursive lambda function, which is probably the answer, splitting the large range of data into the smaller arrays one at a time and stacking them until there are no smaller arrays left to stack. Working to find it as a solution.

If there are 3 arrays, then all three should be vstacked. If there are 10, then all ten, etc. The final array should not change the order of the rows or be data dependent. The column headings will be there and will be the same.

ciso
  • 2,887
  • 6
  • 33
  • 58

2 Answers2

1

Try:

=LET(
    ζ, A1:AQ1000,
    ξ, LAMBDA(φ, TAKE(WRAPROWS(TOCOL(IF(ζ <> "", φ, NA()), 2), 10), , 1)),
    SORTBY(WRAPROWS(TOCOL(ζ, 1), 10), ξ(COLUMN(ζ)), 1, ξ(ROW(ζ)), 1)
)

Amend the last column referenced (AQ here) as required.

Note that this set-up may fail if any of the 'blanks' are in fact null strings (""), e.g., as a result of formulas in those cells.

Jos Woolley
  • 8,564
  • 2
  • 4
  • 9
  • 1
    @ciso Apologies, you're correct. Have posted a new solution. – Jos Woolley May 20 '23 at 05:39
  • So everyone understands the limitation, if one cell has an empty value, the entire resulting array is out of sync and incorrect. RISKY to use as is. A better answer is needed.. I modified the question to be clearer about this and that there can be n columns per array and each array has the same n number of columns. – ciso May 21 '23 at 14:48
  • Can you clarify whether any of those 'blank' cells could actually contain a null string ("") or not? – Jos Woolley May 21 '23 at 15:05
  • Yes they can. Thanks for looking at it again. – ciso May 21 '23 at 15:07
1

Here a possible solution, that works for more than one empty columns, and the group of columns doesn't have to be three. It doesn't transform the input data to remove cell with errors (#N/A, #DIV/0!, etc.) and it considers each group can have a different number of rows.

=LET(in,A1:K11, h,TAKE(in,1), data,DROP(in,1), m,COLUMNS(in),
  idx, FILTER(SEQUENCE(,m), LEN(h)), hUx,UNIQUE(FILTER(h,LEN(h)),1),
  cData,CHOOSECOLS(data,idx), cols,COLUMNS(idx),size,COLUMNS(hUx), gr,cols/size,
  LastNonEmptyRow, LAMBDA(x,XMATCH(2,1/(x<>""),-1,-1)),
  skip, MAX(TOROW(BYCOL(cData, LAMBDA(x, LastNonEmptyRow(x))),2)),
  wrap, WRAPROWS(TOCOL(cData),size),
  sort, SORTBY(wrap,TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,skip))),
  filter, BYROW(sort, LAMBDA(x, AND(IFERROR(x="",FALSE)))),
  VSTACK(hUx,FILTER(IF(sort=0,"",sort), NOT(filter))))

Here is the output: output of the first scenario

The input range (in) can have empty rows, it will be cleaned up at the end. It only requires to include in the input range in all non empty rows for each groups. The formula only depends on a single range (in, easier to maintain it), the rest of the information is taken from in via TAKE and DROP functions.

The name idx, identifies non empty columns, looking for empty cells in the header (h). Since we are using LEN it covers empty cells as a result of a formula which returns an empty string (for example =""). The name hUx, has the columns of our interest (in our sample: {"a","b","c"}). The name cData (clean data) has the input data with the empty columns (group delimiter) removed. The name cols, the total number of columns (after removing empty columns). The name size the number of columns of each group header (i.e. 3). . The name gr, represents the number of column group we have (i.e. 3). The name wrap, wraps the input only considering non empty columns by size columns.

The data of wrap, doesn't have the information sorted in the way we want it, because we want to stack all the rows of the same group of columns one after another and WRAPROWS does it by row. The second input argument of SORTBY ensures the rows are sorted properly:

TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,skip))

Note: Maybe safer to avoid repeated values in the sequence to be generated replace skip by gr*skip, this will ensure the sequence doesn't overlap, so far it works in both of the scenario tested, but just in case. It may also work just to use ROWS(in) as skip value which doesn't require additional calculation, but I came up to this later.

To obtain the skip amount, we calculate the maximum number of non empty rows in cData, via user LAMBDA function LastNonEmptyRow iterating over all input columns via BYCOL. To ensure MAX doesn't return an error we need to remove error values from the output of BYCOL, we use for that TOROW with the input argument ignore=2.

To find the row of the last non empty value in a column we take the idea from here: Get value of last non-empty cell, see section Position of the last value. We just adapted it to use XMATCH which allows reverse search (more efficient in case there are more valid values than empty rows for a large dataset), instead of LOOKUP (we don't want the cell value, instead the position) and ROW (which is row specific). This approach covers most of the cases, since we have cells with error values and considering also the case of empty strings as a result of a formula:

 LastNonEmptyRow, LAMBDA(x,XMATCH(2,1/(x<>""),-1,-1))

The sort name has our desired order. Now we need to remove empty rows. We use for that FILTER. The condition: AND(IFERROR(x="",FALSE) uses IFERROR to consider cell with error values. We can also uses: AND(IFERROR(LEN(x),1)=0). Therefore filter name is TRUE for empty rows.

Finally, we use VSTACK to return the expected result concatenating the header with the calculated data. The condition: IF(sort=0,"",sort) ensures to transform zeros into empty cells. This step is not required, if you configure Excel properly on how to treat empty cells. Check my answer to the following question: How can I return blank cells as it is instead of printing as Zeros while using filter function.

Here is the output for the second test scenario provided by the OP returning the correct answer now (my first approach didn't consider the skip amount, and failed for this scenario): output of the second scenario

I haven't tested all the scenario, please test it and let me know. Thanks

Update

The following is a simplified version, based on the Note above:

=LET(in,A1:K11, h,TAKE(in,1), data,DROP(in,1),lh,LEN(h),
  idx, FILTER(SEQUENCE(,COLUMNS(in)),lh), hUx,UNIQUE(FILTER(h,lh),1),
  cData,CHOOSECOLS(data,idx), size,COLUMNS(hUx), gr,COLUMNS(idx)/size,
  wrap, WRAPROWS(TOCOL(cData),size),
  sort, SORTBY(wrap,TOCOL(SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,ROWS(in)))),
  filter, BYROW(sort, LAMBDA(x,AND(IFERROR(x="",FALSE)))),
  VSTACK(hUx,FILTER(IF(sort=0,"",sort), 1-filter)))
David Leal
  • 6,373
  • 4
  • 29
  • 56
  • 1
    Thanks @MayukhBhattacharya if you check I use your idea in other questions here: `SEQUENCE(ROWS(wrap)/gr) + SEQUENCE(,gr,0,gr)` – David Leal May 23 '23 at 13:29
  • 1
    Oh yes I just saw, thanks really nice solution, I am researching more to understand & learn. Good One Sir =) Sir one question how do you check the performance of formulas, that you mention sometimes in your answers? Do you use `Fast_Excel` app? – Mayukh Bhattacharya May 23 '23 at 13:36
  • 1
    Thanks @MayukhBhattacharya, no I do a simple check using `NOW()`, using an idea I took from this [gist](https://github.com/microsoft/advanced-formula-environment/blob/main/examples/Lib.md) from AndrewDGordon from Advanced Formula Environment Add-ins. I adapted it. Check the function `PERF_MSTEST` from my [gist](https://gist.github.com/dlealv/554bb5d76a007413d9ca74f38db0b8d1). It uses this idea of "thunk the range" via a `LAMBA` function, so you can invoke with your function like this: `lib.PERF_MSTEST(LAMBDA(funToTest(input arguments)))` – David Leal May 23 '23 at 14:25
  • 1
    @ciso I found the issue, I will update my answer, the problem is with `SEQUENCE(,gr,0,gr))`, the steps (now `gr`) should be maximum number of rows among all groups, because this is the distance we need to skip. – David Leal May 23 '23 at 16:28
  • 1
    @ciso check my updated answer which works for both scenario you shared. I think it can be improved instead of using the maximum number of non empty rows, the actual maximum for each group, but I would need to test it. Please let me know the current version works for your needs. There is always the way of using `REDUCE/VSTACK`, but we know it is not an efficient approach, so my solution tries to avoid this iteration process. – David Leal May 23 '23 at 17:42