TLDR and self guided - Here's the example workbook.
Yes, there is a way to join arrays in pre-office 2016. I know this has been answered by ImaginaryHuman above, but I have another way, it returns an array, and it's a little easier to read (IMHO). I'm going to break out evolutions of the formula so that you can find one that fits your use case. I've highlighted the use cases in bold so you can find yours quickly. I know this is rather verbose, but I am the kind of person who likes to know how a solution works, so I'm going to try to give you the same courtesy.
The formula relies on nested IF
statements and INDEX
/CHOOSE
structures. It works with ranges, named ranges, and even table columns. All of my examples show four ranges, hence three IF
statements, but this can be strung up to (I think) 64 ranges if you care for that many nested IF
statements.
For these examples, the data ranges are A3:B6
, A9:B11
, A14:B19
, and A22:B32
. The resulting array formula is put in the range E3:E26
and finished with a Ctrl+Shift+Enter
to make it an array formula. Your data can go wherever you like - you are not tied to these ranges - just substitute your ranges appropriately.
If your data is in contiguous ranges:
=IF(ROW()-ROW(E3)<ROWS(A3:A6),INDEX(A3:B6,ROW()-ROW(E3)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(A3:A6)+ROWS(A9:A11),INDEX(A9:B11,ROW()-ROW(E3)-ROWS(A9:A11),COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(A3:A6)+ROWS(A9:A11)+ROWS(A14:A19),INDEX(A14:B19,ROW()-ROW(E3)-ROWS(A3:A6)-ROWS(A9:A11)+1,COLUMN()-COLUMN(E3)+1),
INDEX(A22:B32,ROW()-ROW(E3)-ROWS(A3:A6)-ROWS(A9:A11)-ROWS(A14:A19)+1,COLUMN()-COLUMN(E3)+1))))
How it works:
- The
IF
statement makes sure that we are in the first range by subtracting the current row from the top of the output range in cell E3
and comparing it to the number of cells in the first input range of A3:B6
.
- The INDEX statement chooses an item from the first input range of
A3:B6
, given a row and column offset calculated from cell E3
.
- If the row is not in the first range it moves on to the next
IF
statement, which repeats the process but compares the current row of the array to the length of the first two ranges. The process repeats for any further nested IF
statements.
If your data is not in contiguous ranges, you need a column showing what range the data originally came from, or both:
=IF(ROW()-ROW(E3)<ROWS(A3:A6),INDEX(CHOOSE({1,2,3},{1},A3:A6,B3:B6),ROW()-ROW(E3)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(A3:A6)+ROWS(A9:A11),INDEX(CHOOSE({1,2,3},{2},A9:A11,B9:B11),ROW()-ROW(E3)-ROWS(A3:A6)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(A3:A6)+ROWS(A9:A11)+ROWS(A14:A19),INDEX(CHOOSE({1,2,3},{3},A14:A19,B14:B19),ROW()-ROW(E3)-ROWS(A3:A6)-ROWS(A9:A11)+1,COLUMN()-COLUMN(E3)+1),
INDEX(CHOOSE({1,2,3},{4},A22:A32,B22:B32),ROW()-ROW(E3)-ROWS(A3:A6)-ROWS(A9:A11)-ROWS(A14:A19)+1,COLUMN()-COLUMN(E3)+1))))
How it works:
- All the principles for the
IF
and INDEX
statements remain the same as above.
- A
CHOOSE
statement is added which allows you to select non-contiguous columns of data or a static array with whatever identifier you want for each range. In this case, I went with numbers (1,2,3,4).
- The
CHOOSE
statement can have as many columns as you like - just change the first argument to {1,2,3,4}
for four columns and add your fourth column as the last argument. Do the same for any subsequent columns (i.e. {1,2,3,4,5}
and add your fifth column as the last argument.
If you have horizontal data instead of vertical data, you can use TRANSPOSE
to make the previous example work. Just nest the TRANSPOSE
function inside the CHOOSE
function like this:
CHOOSE({1,2,3},{1},TRANSPOSE(A3:C3),TRANSPOSE(A4:C4)
You can clean up the formula significantly with named ranges or tables. This example builds on the previous one allowing data not in contiguous ranges and provides an identifier column showing where the data came from:
=IF(ROW()-ROW(E3)<ROWS(Table1),INDEX(CHOOSE({1,2,3},{1},Table1[Column1],Table1[Column2]),ROW()-ROW(E3)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(Table1)+ROWS(Table2),INDEX(CHOOSE({1,2,3},{2},Table2[Column1],Table2[Column2]),ROW()-ROW(E3)-ROWS(Table1)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<ROWS(Table1)+ROWS(Table2)+ROWS(Table3),INDEX(CHOOSE({1,2,3},{3},Table3[Column1],Table3[Column2]),ROW()-ROW(E3)-ROWS(Table1)-ROWS(Table2)+1,COLUMN()-COLUMN(E3)+1),
INDEX(CHOOSE({1,2,3},{4},Table4[Column1],Table4[Column2]),ROW()-ROW(E3)-ROWS(Table1)-ROWS(Table2)-ROWS(Table3)+1,COLUMN()-COLUMN(E3)+1))))
If that isn't enough, you can do more housekeeping for readability by creating some named values. The first thing that can be done is to define at what row we start getting data from each table. For this example, I have named these Table2_UL
, Table3_UL
, and Table4_UL
. Their code formula in the name manager looks like this:
Table2_UL
: =ROWS(Table1)
Table3_UL
: =Table2_UL+ROWS(Table2)
Table4_UL
: =Table3_UL+ROWS(Table3)
As you can see, each one builds upon the last so its output is dynamic. We now have a much more readable formula:
=IF(ROW()-ROW(E3)<Table2_UL,INDEX(CHOOSE({1,2,3},{1},Table1[Column1],Table1[Column2]),ROW()-ROW(E3)+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<Table3_UL,INDEX(CHOOSE({1,2,3},{2},Table2[Column1],Table2[Column2]),ROW()-ROW(E3)-Table2_UL+1,COLUMN()-COLUMN(E3)+1),
IF(ROW()-ROW(E3)<Table4_UL,INDEX(CHOOSE({1,2,3},{3},Table3[Column1],Table3[Column2]),ROW()-ROW(E3)-Table3_UL+1,COLUMN()-COLUMN(E3)+1),
INDEX(CHOOSE({1,2,3},{4},Table4[Column1],Table4[Column2]),ROW()-ROW(E3)-Table4_UL+1,COLUMN()-COLUMN(E3)+1))))
But that's not enough for me. I want to get rid of all those nasty references to ROW()
and COLUMN()
. We can do that by defining two more values in the name manager that keep track of our current row and column for us:
Output_CC
: =COLUMN()-COLUMN(Sheet1!E3)+1
Output_CR
: =ROW()-ROW(Sheet1!E3)+1
Finally, we have something that is near human readable:
=IF(Output_CR-1<Table2_UL,INDEX(CHOOSE({1,2,3},{1},Table1[Column1],Table1[Column2]),Output_CR,Output_CC),
IF(Output_CR-1<Table3_UL,INDEX(CHOOSE({1,2,3},{2},Table2[Column1],Table2[Column2]),Output_CR-Table2_UL,Output_CC),
IF(Output_CR-1<Table4_UL,INDEX(CHOOSE({1,2,3},{3},Table3[Column1],Table3[Column2]),Output_CR-Table3_UL,Output_CC),
INDEX(CHOOSE({1,2,3},{4},Table4[Column1],Table4[Column2]),Output_CR-Table4_UL,Output_CC))))
If we really want to take it all the way, we can turn our CHOOSE
statements into named values as well. Just do the following for each of your input tables in the Name Manager, making sure to give each a unique name:
Table1_IN
: =CHOOSE({1,2,3},{1},Table1[Column1],Table1[Column2])
Now we can read the formula really easy:
=IF(Output_CR-1<Table2_UL,INDEX(Table1_IN,Output_CR,Output_CC),
IF(Output_CR-1<Table3_UL,INDEX(Table2_IN,Output_CR-Table2_UL,Output_CC),
IF(Output_CR-1<Table4_UL,INDEX(Table3_IN,Output_CR-Table3_UL,Output_CC),
INDEX(Table4_IN,Output_CR-Table4_UL,Output_CC))))
Again, though, that is not enough because you cannot turn on the filter and sort arrays A-Z. You get the error "You can't change part of an array." There is a workaround, though! It requires a helper column and duplicating your output. It can be duplicated to a plain old range or into a table. To allow you to both sort and filter your data, create a helper column to the left of the array output, in this case, starting in D3
. If your data does not need to be ranked (like all text columns), create static numbering (1, 2, 3, 4, etc). In this example, column G
contains a number to be ranked. If it does need to be ranked enter the following formula in D3
and drag it down:
=RANK.EQ(G3,G$3:G$26,0)+COUNTIF(G$3:G3,G3)-1
Change the final argument to 1
if you need an ascending ranking instead. You now have an out of order ranking if your data was ranked or an unsortable array with a static number next to it if not. Now we duplicate the data into a range or table. In column I
, starting at I3
, create static numbering as long as the dataset (ie 1, 2, 3, 4). Now to the right in cell J3
enter a VLOOKUP
that refers to the data in the source array:
=VLOOKUP($I3,$D$3:$G$26,COLUMNS($I$3:J3),FALSE)
Drag the formula down and then drag it right. You can now sort and filter your data just as if it was a normal range.