1

As an example, here is my Table: Input data

My objective with this data is to sum the Values in the Amount columns which fit specific Project, Section and Date criteria.

As you can see there are 2 sets of Project|Section|Amount, in the final product there will be as many as 30, rendering sumifs quite a pain to write and also to reuse for diferent project and section criteria.

The explanation for this format is the each invoice can be applied to several project/sections. Which is shown in the first row since the 2 amounts combine to be equal to the total in the same row (1000+200=1200)

Expected result:

Expected output

In my expected result the formula would check every Project-Section-Amount dataset, and for eache ntry would check if the first column is of the expected month, if project and section matches the given values, and add them up.

I have tried googling diferent functions and formulas, a combination of sumproduct with sumifs and indexes or offsets, but have had no success.

David Leal
  • 6,373
  • 4
  • 29
  • 56

1 Answers1

1

Updated the response, based on the latest OP's feedback. Maybe there are shorter ways of doing it (it is not a simple task anyway). In the meantime you can try the following approach:

=LET(in,A1:J5, h, TAKE(in,1), data, DROP(in,1),
 SOMs, EOMONTH(1*TAKE(data,,1),-1)+1, uxSOMs,UNIQUE(SOMs),
 QRY, LAMBDA(lk, CHOOSECOLS(data, FILTER(SEQUENCE(,COLUMNS(h)), h=lk))),
 prjs, QRY("Project"), sects, QRY("Section"), amnts, QRY("Amount"),
 n, COLUMNS(prjs), seq,SEQUENCE(n),dates,IF(TOROW(seq),SOMs),
 wdata, WRAPROWS(TOROW(CHOOSECOLS(HSTACK(dates, prjs,sects,amnts),
  TOROW(HSTACK(seq, n+seq, 2*n+seq, 3*n + seq)))),4),
 f, FILTER(wdata, (INDEX(wdata,,2)<>"") + (INDEX(wdata,,3)<>"")),
 prjSec, UNIQUE(CHOOSECOLS(f,2,3)),
 HCALC, LAMBDA(set,
  HSTACK(TAKE(DROP(set,,1),1,2), MMULT(SEQUENCE(,ROWS(set),,0),
    IF(TAKE(set,,1)=TOROW(uxSOMs), TAKE(set,,-1),0)))),
 REDUCE(HSTACK({"Project","Section"}, TOROW(uxSOMs)),
  SEQUENCE(ROWS(prjSec)), LAMBDA(ac,s,
  VSTACK(ac, HCALC(FILTER(f, (INDEX(f,,2)=INDEX(prjSec,s,1)) 
    * (INDEX(f,,3)=INDEX(prjSec,s,2))))))))

Here is the output: output

We have a single name (in) that depends on the input range, the rest of the variables are deduced from in: h, the header, data, the input data without the header. SOMs, the start date of the month from the input dates.

To extract project (prjs), sections (sects) and amounts (amnts) columns, we defined a user LAMBDA function QRY, to avoid repeating the same calculation for each of them, so we call QRY with the corresponding parameter to get each of them.

The main goal is to accommodate the input in a way it is easier to do the calculation by a unique combination of project and section and unique months. We plan to use REDUCE/VSTACK pattern(1). This transformed input data is represented by the name f. For this sample data this is how f will look like:

6/1/2023    OCL CA  1000
6/1/2023    OL  LAP 200
6/1/2023    OL  LAP 2000
6/1/2023    OL  LAP 3000
7/1/2023    OL  LAP 600

where each column represents: date, project, section, and amount. Having this input data. We can iterate over the unique combination of project and section and summarize the amount for each unique month. We do this via REDUCE/VSTACK pattern. Where on each iteration it generates for a given combination of projects and sections the totals for unique months, via the user LAMBDA function: HCALC.

To get f, it first generates as many columns as the dataset we have (n) for dates, repeating the date column, this information is stored in dates name via: IF(TOROW(seq),SOMs), i.e, n-columns with repeated dates. Then it stacks horizontally the group of dates, projects, sections, and amounts via HSTACK(dates, prjs,sects,amnts). To select the columns in a specific order we use the following:

TOROW(HSTACK(seq, n+seq, 2*n+seq, 3*n + seq) 

Now we are ready via WRAPROWS of 4 (date, project, section, and amount), to have the data in the way we wanted (wdata). Now it just needs to remove rows with no project or section information and that is the FILTER call to get finally f. We use this filter to focus the calculation where there is information only.

Having f, we can then get the unique combination of projects and sections (prjSec).

The logic for doing the calculation is inside HCALC. The input set, is the f data filtered by one of the unique combinations of prjSec. We use MMULT to identify where the dates are equal to the unique start of the months (uxSOM):

IF(TAKE(set,,1)=TOROW(uxSOMs), TAKE(set,,-1),0)

The previous formula generates on each column the corresponding amount for unique months. We have as many columns as unique months we have. Now it just needs to do the multiplication via MMULT to sum by column the amounts. It uses a row array of 1s to do the multiplication: SEQUENCE(,ROWS(set),,0). Finally, we use HSTACK to add horizontally to the MMULT result, the project and the corresponding section (TAKE(DROP(set,,1),1,2)).

Finally, we use REDUCE/VSTACK with the accumulator (ac) initialized with the header of the output. We iterate over the number of unique pair of combinations of project and section: SEQUENCE(ROWS(prjSec)), invoking on each iteration HCALC filtered by the corresponding project and section on each iteration.

Notes:

  1. The MMULT calculation can be replaced with BYCOL, but usually MMULT is more efficient.
  2. The previous formula doesn't depend on a specific position for the key columns, it finds such columns looking in the header (h).

(1): how to transform a table in Excel from vertical to horizontal but with different length

David Leal
  • 6,373
  • 4
  • 29
  • 56
  • Thank you for the help, I appreciate this so much. It seems like the results is only reading data from the first dataset? It doesnt seem to get the '200' Amount from the second dataset. There is also no project, section or date filtering, but that can be added. This is still great though, giving me new functions to use. Thank you. – Santiago Casero Jul 04 '23 at 10:03
  • @SantiagoCasero it does take into account `200` because the project and the section are not the same compared to the first data set. If in the second dataset the project and section are the same to the first dataset then it will be taken into account. What it does is by row sums the amount with the same project and section. Please clarify what do you want exactly. It is not clear to me now. Please share in the question the expected output an why. Thanks. – David Leal Jul 04 '23 at 14:47
  • @SantiagoCasero I added a second solution assuming you are looking to sum totals based on lookup values – David Leal Jul 04 '23 at 15:35
  • I see the misunderstanding here, I am not searching for values where one dataset matches the first one, I am searching for values in any of the datasets that match specific values. Hope that helps. – Santiago Casero Jul 04 '23 at 18:52
  • @SantiagoCasero check my solution under the section. Sum amount per row from data set where project and section are the same to the first dataset. If that is not the case, please update your question adding more clarity. Thanks – David Leal Jul 05 '23 at 01:50
  • I have updated my post with screenshots of what I expect (I didnt originally since it wouldnt let me embebed them) – Santiago Casero Jul 05 '23 at 10:18
  • @SantiagoCasero check my answer now, I updated it based on your last update, I hope it works, it is long task, because it requires several transformation in order to get the data in the convenient format. I see you have uploaded the screenshot, it is good to understand sometimes the problem, but as a best practice, please use table markdown, so we can easy copy the input data. The information that was missing initially was the sample of the output data you provided now. Please if it works, **accept and/or upvote** my answer. Thanks – David Leal Jul 05 '23 at 18:11
  • 1
    This works so well, thank you so much =) – Santiago Casero Jul 06 '23 at 10:37