Excel formula section to build an array from table data dynamically

Question

I do not know if this is at all possible using a single cell formula. Excel 2010 compatible (and upwards) formula would be desired. The purpose is to use a formula such as

{=SUM(INDEX(built_Array;N(IF(1;ROW(INDIRECT(x1 &":"& x2)))))}

where x1 and x2 are refs to cells containing a start index and an end index compatible with the built array.

The "built array" part of the Excel formula should be constructed from data in two tables elsewhere: the identifier on the tables to be used to select the actual contents (with lookup excel funcs). One table includes the number of repeat values and the other the actual values.

For example:

	P1	P2	P3		P1	P2	P3
i01	2	4		i01	20.0	20.6
i02	3			i02	10.0
i03	2	7	9	i03	30.0	30.4	30.2
i04	4	2		i04	15.0	15.1
i05	5			i05	10.0

Hence the built array for i03 would be

{30.0;30.0;30.4;30.4;30.4;30.4;30.4;30.4;30.4;30.2;30.2;30.2;30.2;30.2;30.2;30.2;30.2;30.2}

and for i04 would be

{15.0;15.0;15.0;15.0;15.1;15.1}

Then, the result from the array (ctrl-shift-enter) formula above for i04 taking the 3rd to the 5th value would be 45.1 and for i04 taking the 2nd to 3rd value would be 30.

What I am finding difficulty with is to build the array from the table within the formula, especially as each identifier might have a different number of components.

I have the hunch that this should be feasible, in other programming languages iteration or recursivity would have been used but I wanted to explore this way rather than reverting to VBA (if there is a conceptual reason for this to be impossible in Excel formulae I would be very interested as well, in case I should change the approach as I am interested in formula results).

So you basically need to dynamically sum those values? Therefore and input of i04 should return 90.2? — Evil Blue Monkey, Aug 24 '21 at 18:23
@EvilBlueMonkey no - he needs to return the value `y` repeating `x` number of times where `y` is from the right table and `x` is from the left table — urdearboy, Aug 24 '21 at 18:53
Ye, but after that aren't you summing a part of the built_Array from x1 to x2? So with an input of i04 1 to 6 the output should be 90.2, right? i04 2 to 3 would return 30, right? — Evil Blue Monkey, Aug 24 '21 at 19:06
I came up with a possible solution (to direcly calculate the total sum as delimited, not to create the array) but it will take me some days to post it properly explained. — Evil Blue Monkey, Aug 30 '21 at 21:30

user1938578 · Accepted Answer · 2021-09-13T14:28:52.440

A complete working answer to the question have been found after some research (the most helpful in stackoverflow and contributors suggestions, thanks).

METHOD 1 (preferred)

The single cell CSE (array) formula below uses several variations of the following

=((ROW(INDIRECT("1:"& e1))>=lb1)*(ROW(INDIRECT("1:"& e1))<=ub1))

where e1 is the number of elements in the 1-column array and lb1 is the lower bound for which a 1 is required and ub1 is the upper bound, out of the bounds elements' value is 0.

OP data needs to be trivially converted to cumulative indices for proposed formula

        P1  P2  P3          P1      P2      P3
i01     2   6   6   i01     20.0    20.6    
i02     3   3   3   i02     10.0        
i03     2   9   18  i03     30.0    30.4    30.2
i04     4   6   6   i04     15.0    15.1    
i05     5   5   5   i05     10.0

The following is a CSE formula to be applied over a range with the same number of columns as existing columns (sample above: 3 consecutive columns) and, then after inserting the formula in the formula bar, use ctrl-shift-enter.

Formula

{=TRANSPOSE(MMULT(TRANSPOSE(((ROW(INDIRECT("1:"&e1))>=IF(COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;1;0)=0;0;OFFSET(set_of_pos;0;-1))+1)*(ROW(INDIRECT("1:"&e1))<=set_of_pos))*((ROW(INDIRECT("1:"&e1))>=x2+1)*(ROW(INDIRECT("1:"&e1))<=x1)));--(ROW(INDIRECT("1:"&e1))>0)))}

where e1 is the number of elements in the 1-column array (it's also the value at last position), x1 and x2 are refs to cells containing a start index and an end index; set_of_pos is the RANGE that tells the last position of each in each section (for example, for i03: 2,9,18)

PARTS ((ROW(INDIRECT("1:"&e1))>=x2+1)*(ROW(INDIRECT("1:"&e1))<=x1))) creates an (in-formula) array of a consecutive number of 1s (from booleans) bounded by 0s representing the indices that need to be kept or registered. (Dimensions e1 × 1.)

((ROW(INDIRECT("1:"&e1))>=IF(set_of_pos=1;0;OFFSET(set_of_pos;0;-1))+1)*(ROW(INDIRECT("1:"&e1))<=set_of_pos)) creates an (in-formula) array of dimensions e1 × columns(set_of_pos) in which each section cell expands the value into a stack of 1s. [In this part, for each column, upper bound is the actual value, and lower bound is the value to the left plus 1 - because set_of_pos is a RANGE, this is done for each column in set_of_pos as long as CSE is used to enter formula; nb: if it is the the leftmost cell, 0 is used instead.]

Then both parts are operated with *, thus obtaining an (in-formula) array of dimensions e1 × columns(set_of_pos) with the positions satisfying being in the section stack AND in the set of indices to be kept (conveniently separated in columns).

We need to TRANSPOSE this result obtaining an (in-formula) array of dimensions columns(set_of_pos) × e1. MMULT (and all-1s e1 × 1 defined by --(ROW(INDIRECT("1:"&e1))>0) enables us to count the number of 1s for each column in a compressed array with the quantity in a different column [columns(set_of_pos) × 1]. [NB, we need to include double negative "--" to convert booleans into numbers as MMULT requires number array or else it throws an error - booleans are not accepted in MMULT, as per function specs.]

An additional transpose is required to put the array horizontally [1 × columns(set_of_pos)].

Now, final result is obtained by =SUMPRODUCT(previous_result;set_of_val) where previous_result is the result from above and set_of_val is the RANGE with the values.

Having a way to preserve which stacks are modified serves other purposes (spent stacks can be computed) and obtaining the final result is a trivial application of function SUMPRODUCT. The number of columns or sections is not hardcoded - it is set by choosing the set_of_pos inside the formula range AND selecting the appropriate number of consecutive columns when applying the (CSE) array formula.

METHOD 2 For the sake of completeness, a second method has been explored. If that is thought to be less resource intensive, I would certainly like to know. My hunch is that it is more resource intensive albeit with less typing, but I have not been able to test it.

Basically, a string with all the values is built into a cell (for example, using REPT and pipes |, for i03 we would have "30.0|30.0|30.4|30.4|30.4|30.4|30.4|30.4|30.4|30.2|30.2|30.2|30.2|30.2|30.2|30.2|30.2|30.2|")

A working solution (CSE is not required - SUMPRODUCT takes care of the three 1-column arrays built adequately)

=SUMPRODUCT(((ROW(INDIRECT("1:"& e1))<=x1)*(ROW(INDIRECT("1:"&e1))>=x2+1))*TRIM(MID(SUBSTITUTE(s1;"|";REPT(" ";LEN(s1)));(ROW(INDIRECT("1:"& e1))-1)*LEN(s1)+1;LEN(s1))))

where x1, x2 and e1 are refs to cells containing a start index and an end index and the number of total elements (as defined above) and s1 is the string with all values in order.

PARTS

(ROW(INDIRECT("1:"& e1))<=x1)*(ROW(INDIRECT("1:"&e1))>=x2+1))

is essentially present in METHOD 1: an (in-formula) array of a consecutive number of 1s (from booleans) bounded by 0s representing the indices that need to be kept. (Dimensions e1 × 1.)

TRIM(MID(SUBSTITUTE(s1;"|";REPT(" ";LEN(s1)));(ROW(INDIRECT("1:"& e1))-1)*LEN(s1)+1;LEN(s1)))

The formula takes the string (s1) and replaces each separator with number of spaces equal to the length of the string, then places different sections on each element of the array created by ROW(INDIRECT("1:"& e1))

[A more general solution would have been to obtain the number of elements by getting the difference between the string and the string with delimiters taken out, but the number of elements is required in my little project and defined under the label e1. The technique is explained elsewhere, succinctly, places sections of of the length of the original string starting at the sequence number given by ROW(INDIRECT("1:"& e1)) times the length of the original string and then removing all surrounding spaces with TRIM.]

Acknowledgements and references

Thanks to answers and comments, it has been possible to locate two stackoverflow questions which have relevant comments:

https://stackoverflow.com/questions/25316094/split-a-string-cell-in-excel-without-vba-e-g-for-array-formula/25316426[enter link description here][1] and Is there a way to concatenate two arrays in Excel without VBA?

Both are a good read. Method 1 has been developed partly from scratch but Method 2 have been greatly improved by careful read, in particular, for converting a string into an in-formula array. ROW(INDIRECT()) has been chosen over SEQUENCE() as compatibility with Excel 2010 was desired.

I have been privately requested to use the final version as the formula edit is minor. It might make the formula slightly less clear but it requires mostly the same amount of explanation. Now, it is updated. — user1938578, Sep 06 '21 at 08:34

Evil Blue Monkey · Answer 2 · 2021-09-21T13:10:42.367

To achieve our goal we need to create a series of array that, conveniently "mixed", will result in the series of values we have to sum. The whole process is graphically depicted here:

On the left we have the source data and the user's input. The source data (the starting arrays) are assumed to be 2 ranges each made of contiguous cells (areas). A part from these 2 ranges, all the other data will be expressed/calculated via names. These are the names needed:

Names	ReferenceToR1C1
Target	=Sheet1!R4C12 <<<this should return the value to be searched (example: "i03")
MLR_Mtrx_Addr	=Sheet1!R5C12 <<<this should return a string reporting the address of the multipliers array (example: "B5:D9")
MLD_Mtrx_Addr	=Sheet1!R6C12 <<<this should return a string reporting the address of the multiplied array (example: "G5:I9")
From	=Sheet1!R7C12 <<<this should return the starting value from which the formula will sum on (example: 2)
To	=Sheet1!R8C12 <<<this should return the ending value to which the formula will sum on (example: 11)
Array01	=(((ROW(INDIRECT("1:"&MLR_Max_Val)))^1)+(COLUMN(INDIRECT("C1:C"&MLR_Col_Cnt,FALSE))^1-1)*MLR_Max_Val)
Array02	=((ROW(INDIRECT("1:"&MLR_Max_Val)))^1*(COLUMN(INDIRECT("C1:C"&MLR_Col_Cnt,FALSE))^0))
Array03	=Array01/((Array02<=INDIRECT(MLR_Tar_Rel_Row_Addr)))
Array04	=AGGREGATE(15,6,Array03,From)
Array05	=AGGREGATE(15,6,Array03,To)
Array06	=IFERROR((Array03>=Array04)(Array03<=Array05)INDIRECT(MLD_Tar_Rel_Row_Addr),0)
MLD_Tar_Rel_Row	=MATCH(Target,OFFSET(INDIRECT(MLD_Mtrx_Addr),0,-1,,1),0)
MLD_Tar_Rel_Row_Addr	=SUBSTITUTE(CELL("address",OFFSET(INDIRECT(MLD_Mtrx_Addr),MLD_Tar_Rel_Row-1,0,1,1))&":"&CELL("address",OFFSET(INDIRECT(MLD_Mtrx_Addr),MLD_Tar_Rel_Row-1,COLUMNS(INDIRECT(MLD_Mtrx_Addr))-1,1,1)),"$","")
MLR_Col_Cnt	=COLUMNS(INDIRECT(MLR_Mtrx_Addr))
MLR_Fin_Val_Cnt	=SUM(INDIRECT(MLR_Tar_Rel_Row_Addr))
MLR_Max_Val	=MAX(INDIRECT(MLR_Tar_Rel_Row_Addr))
MLR_Tar_Rel_Row	=MATCH(Target,OFFSET(INDIRECT(MLR_Mtrx_Addr),0,-1,,1),0)
MLR_Tar_Rel_Row_Addr	=SUBSTITUTE(CELL("address",OFFSET(INDIRECT(MLR_Mtrx_Addr),MLR_Tar_Rel_Row-1,0,1,1))&":"&CELL("address",OFFSET(INDIRECT(MLR_Mtrx_Addr),MLR_Tar_Rel_Row-1,COLUMNS(INDIRECT(MLR_Mtrx_Addr))-1,1,1)),"$","")

Note
The first five names are still ranged-related. These are the user's input data. The user should modify their ReferenceToR1C1 accordingly to his/her needs as specified by the side notes.

This code should allow us to rapidly create all the said names:

Sub SubAddNames()
    
    With ActiveWorkbook.Names
        .Add name:="Target", RefersToR1C1:="=Sheet1!R4C12"
        .Add name:="MLR_Mtrx_Addr", RefersToR1C1:="=Sheet1!R5C12"
        .Add name:="MLD_Mtrx_Addr", RefersToR1C1:="=Sheet1!R6C12"
        .Add name:="From", RefersToR1C1:="=Sheet1!R7C12"
        .Add name:="To", RefersToR1C1:="=Sheet1!R8C12"
        .Add name:="Array01", RefersToR1C1:="=(((ROW(INDIRECT(""1:""&MLR_Max_Val)))^1)+(COLUMN(INDIRECT(""C1:C""&MLR_Col_Cnt,FALSE))^1-1)*MLR_Max_Val)"
        .Add name:="Array02", RefersToR1C1:="=((ROW(INDIRECT(""1:""&MLR_Max_Val)))^1*(COLUMN(INDIRECT(""C1:C""&MLR_Col_Cnt,FALSE))^0))"
        .Add name:="Array03", RefersToR1C1:="=Array01/((Array02<=INDIRECT(MLR_Tar_Rel_Row_Addr)))"
        .Add name:="Array04", RefersToR1C1:="=AGGREGATE(15,6,Array03,From)"
        .Add name:="Array05", RefersToR1C1:="=AGGREGATE(15,6,Array03,To)"
        .Add name:="Array06", RefersToR1C1:="=IFERROR((Array03>=Array04)*(Array03<=Array05)*INDIRECT(MLD_Tar_Rel_Row_Addr),0)"
        .Add name:="MLD_Tar_Rel_Row", RefersToR1C1:="=MATCH(Target,OFFSET(INDIRECT(MLD_Mtrx_Addr),0,-1,,1),0)"
        .Add name:="MLD_Tar_Rel_Row_Addr", RefersToR1C1:="=SUBSTITUTE(CELL(""address"",OFFSET(INDIRECT(MLD_Mtrx_Addr),MLD_Tar_Rel_Row-1,0,1,1))&"":""&CELL(""address"",OFFSET(INDIRECT(MLD_Mtrx_Addr),MLD_Tar_Rel_Row-1,COLUMNS(INDIRECT(MLD_Mtrx_Addr))-1,1,1)),""$"","""")"
        .Add name:="MLR_Col_Cnt", RefersToR1C1:="=COLUMNS(INDIRECT(MLR_Mtrx_Addr))"
        .Add name:="MLR_Fin_Val_Cnt", RefersToR1C1:="=SUM(INDIRECT(MLR_Tar_Rel_Row_Addr))"
        .Add name:="MLR_Max_Val", RefersToR1C1:="=MAX(INDIRECT(MLR_Tar_Rel_Row_Addr))"
        .Add name:="MLR_Tar_Rel_Row", RefersToR1C1:="=MATCH(Target,OFFSET(INDIRECT(MLR_Mtrx_Addr),0,-1,,1),0)"
        .Add name:="MLR_Tar_Rel_Row_Addr", RefersToR1C1:="=SUBSTITUTE(CELL(""address"",OFFSET(INDIRECT(MLR_Mtrx_Addr),MLR_Tar_Rel_Row-1,0,1,1))&"":""&CELL(""address"",OFFSET(INDIRECT(MLR_Mtrx_Addr),MLR_Tar_Rel_Row-1,COLUMNS(INDIRECT(MLR_Mtrx_Addr))-1,1,1)),""$"","""")"
    End With
    
End Sub

Once the names are correctly set, the final formula will be this one:

=IF(OR(INT(From)<>From,INT(To<>To),To>MLR_Fin_Val_Cnt,From>To),#VALUE!,SUM(IFERROR(Array06,0)))

It will works only as array formula, therefore we need to copy, select the desired cell, paste the formula and then press Ctrl+Shift+Enter.

Thank you Evil Blue Monkey. I appreciate very much your effort. I had a working solution myself that I was planning to post soon. Your solution looks very good, but I believe I have managed to build arrays dynamically using booleans. =((ROW(INDIRECT("1:"& e1))>=lb1)*(ROW(INDIRECT("1:"& e1))<=ub1)), where e1 is the number of elements in the 1-column array and lb1 is the lowest index for which a 1 is required and ub1 is the highest position for 1, the remaining array elements are filled with 0s. — user1938578, Sep 02 '21 at 09:59

score 0 · Answer 3 · answered Aug 24 '21 at 19:52

Here's how I would solve that:

=SUMPRODUCT((B2:D6*B10:D14)*(RIGHT(A2:A6)>=RIGHT(X1))*(RIGHT(A2:A6)<=RIGHT(X2)))

SUMPRODUCT() is going to allow us to evaluate as an array, '=SUM()' + CSE is an equally valid solution.

(B2:D6*B10:D14) is our array. I wasted a lot of time trying to force the array to evaluate {20, 20, 20.6, 20.6, 20.6, 20.6} before I realized we were just going to total the whole row. I won't go into details, but if you're trying to do that start here and use the REPT function (https://exceljet.net/formula/text-split-to-array).

(RIGHT(A2:A6)>=RIGHT(X1))*(RIGHT(A2:A6)<=RIGHT(X2)), since the last character is a unique ascending number I just had it evaluate TRUE() if the end of the reference in X1 is less than or equal to the rows' numbers. And of course the inverse for X2.

The Boolean TRUE and FALSE values can have algebra applied to them, with TRUE as 1 and FALSE as 0. So I just multiplied them across to discard any values outside of the search area.

Believe formula does not do it, especially if the constraints apply to P1-P3 which are arbitrary names (RIGHT(A2:A6) seems to me that). Two approaches proposed well worth review. 1-REPT Excel function enables a String approach but I would avoid hard coding max num of components. 2-SUMPRODUCT approach, it will be a matter of converting an array (for i03 {2;7;9}) into another depending on the start-finish, if 3rd to 12th, it would be {0;7;3} and this can be achieved by using x1 an x2 for sure, a general formula with ranges that can be changed preferred. Apologizes if I misunderstood. — user1938578, Aug 25 '21 at 10:50
@user1938578, I think you're on the right track, and this is pushing the limits of what is possible. But that's my favorite kind of question, something that pushes my limits to learn something new. I misunderstood the question, I'll leave the original "answer" for posterity. If I find something better later I'll throw it on top with an edit. — I like Excel very much, Sep 01 '21 at 15:50

user1938578 · Answer 4 · 2021-09-18T11:54:49.650

Further reading on issue have prompted a version avoiding volatile functions. If a non-volatile Method 1 working formula is desired, it can be achieved by replacing volatile INDIRECT with non-volatile INDEX reference form function.

The general expression

((ROW(INDIRECT("1:"& n1))>=lb1)*(ROW(INDIRECT("1:"& n1))<=ub1))

converts to

((ROW($A$1:INDEX($A:$A;n1;0))>=lb1)*(ROW($A$1:INDEX($A:$A;n1;0))<=ub1))

where n1 is the number of elements in the 1-column array and lb1 is the lower bound for which a 1 is required and ub1 is the upper bound.

OFFSET, the only other volatile function, is used once (to obtain lower bounds from the cell to the left). It can be replaced with INDEX array form function:

IF(COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;0;1))=0;0;OFFSET(set_of_pos;0;-1))+1

converts to

IF(COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;0;1))=0;0;INDEX(set_of_pos;0;COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;0;1)))+1

[equivalently, creating an array out of the general grid (-1 being the offsetting code),

IF(COLUMN($A$1:INDEX($1:$1;0;COLUMNS(set_of_pos)))=1;0;INDEX(set_of_pos;0;COLUMN($A$1:INDEX($1:$1;0;COLUMNS(set_of_pos)))-1))+1

]

where set_of_pos is the horizontal RANGE that tells the last position in each section of the original post (hence, the formula section gives 0+1 when it is the first -leftmost- cell and 1 + value of cell to the left and thus the lower bound of the section is obtained).

After selecting an horizontal range of number of sections cells, the formula needs to be entered with Ctrl-Shift-Enter

{=TRANSPOSE(MMULT(TRANSPOSE(((ROW($A$1:INDEX($A:$A;e1;0))>=IF(COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;0;1))=0;0;INDEX(set_of_pos;0;COLUMN(set_of_pos)-COLUMN(INDEX(set_of_pos;0;1)))+1)*(ROW($A$1:INDEX($A:$A;e1;0))<=set_of_pos))*((ROW($A$1:INDEX($A:$A;e1;0))>=x2+1)*(ROW($A$1:INDEX($A:$A;e1;0))<=x1)));--(ROW($A$1:INDEX($A:$A;e1;0))>0)))}

where e1 is the number of elements (or a reference to a cell containing that) in the 1-column array, x1 and x2 are references to cells containing a start index and an end index; set_of_pos is the RANGE that tells the last position in each section as defined in the answer with volatile functions. (e1 can be obtained through the expression INDEX(set_of_pos;0;COLUMNS(set_of_pos)), given source data.)

[It might be worth noting that since all INDEX functions in the formula use 1-dimensional range references, the 0-valued arguments can be omitted as INDEX treats a single additional argument for 1-dimensions correctly.]

Non-volatile approach for ROW(INDIRECT()) found on Is there a way to concatenate two arrays in Excel without VBA? [Search for "Without the volatile", a comment by Scott Craner].

Excel formula section to build an array from table data dynamically

4 Answers4