3

I am working on a file with thousands of lines and need to find the minimum corresponding value based on duplicate values in another column

Brand Model Price
Toyota Tundra 20360
Toyota Corolla 28502
Ford F-350 36555
Ford F-150 28963
Ford Expedition 27585
Tesla Model 3 25812
Mazda Mazda3 22036
Mazda Mazda6 25875
Mazda CX9 58200

What I need to for the formula to look to duplicate values in column A, regardless of whether its a single value or 3 duplicates, and then return the lowest cost.

Final result of what I'm looking for

Brand Model Price
Toyota Tundra 20360
Ford Expedition 27585
Tesla Model 3 25812
Mazda Mazda3 22036
Mayukh Bhattacharya
  • 12,541
  • 5
  • 21
  • 32
Tonypz16
  • 47
  • 2
  • 1
    Does this answer your question? [Excel - extracting top 5 values](https://stackoverflow.com/questions/58639470/excel-extracting-top-5-values) – Solar Mike Apr 03 '23 at 17:24
  • Hmm. Perhaps sort by Brand and then Price descending. Then write a formula that indicates when the Brand is not equal to the row above it. Those are the ones you want to keep. – Steven Rumbalski Apr 03 '23 at 17:27

4 Answers4

4

I guess it's worth adding an older-style countifs approach:

=FILTER(A2:C10,COUNTIFS(A2:A10,A2:A10,C2:C10,"<"&C2:C10)=0)

enter image description here

Will it be slow if OP has 'thousands of lines'? Yes, probably.

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
  • 1
    Definitely worth it. Other answers will be just as slow I suppose (if not slower). – JvdV Apr 04 '23 at 07:26
  • 1
    Tried it on 10K rows, mine takes a second or so, yours is instantaneous. – Tom Sharpe Apr 04 '23 at 07:57
  • 2
    Who would have thought, `LAMBDA()` keeps surprising me. Mostly slow, quick at times =) – JvdV Apr 04 '23 at 07:59
  • I am surprised too, I tested it for 10K rows too: Tom (0.77s), JvdV (0.06), and mine (21s). I am using a `LAMBDA()` too (`MAP`) but it gives the worse performance. – David Leal Apr 06 '23 at 13:47
  • Just for curiosity, I tested this one: `=LET(in,A2:C10,A,TAKE(in,,1),C,TAKE(in,,-1), ux, UNIQUE(A),m, MINIFS(C,A,ux),FILTER(in,1-ISNA(XMATCH(A&"_"&C,ux&"_"&m))))` is the fastest one. I tested for `100K` rows: JvdV(0.54s) and this solution (0.24s), the `LAMBDA` solution is incredibly fast anyway. To be aware of possible false positives due to concatenation, but not really based on the input data string pattern. I didn't test the other solutions, because the performance was lower already for fewer rows. – David Leal Apr 06 '23 at 15:04
3

One option:

enter image description here

Formula in E1:

=LET(x,TOROW(UNIQUE(A2:A10)),FILTER(A2:C10,BYROW((A2:A10=x)*(C2:C10=MINIFS(C2:C10,A2:A10,x)),LAMBDA(z,SUM(z)))))
JvdV
  • 70,606
  • 8
  • 39
  • 70
3

Cheapest Car By Make

enter image description here

  • Here's a REDUCE version. Hopefully, there aren't too many makes.
=LET(d,A2:C10,bc,1,pc,3,
    b,INDEX(d,,bc),u,UNIQUE(b),
DROP(REDUCE("",SEQUENCE(ROWS(u)),LAMBDA(rr,r,
    VSTACK(rr,TAKE(SORT(FILTER(d,b=INDEX(u,r),""),pc),1)))),1))

The Variables

  • d - the data (range)
  • bc - the make column index
  • pc - the price column index
  • b - the make column
  • u - the unique makes
  • rr - the current Reduce result
  • r - the current Reduce row

The Flow

  • For each make, the data is FILTERed by the UNIQUE make at the row INDEX supplied from the SEQUENCE function, SORTed, and only the first row is TAKEn.
  • Each row is VSTACKed to the bottom of the previous ones, the REDUCEd result, and finally, the initial erroring row is DROPped.
VBasic2008
  • 44,888
  • 5
  • 17
  • 28
  • 1
    This was fast when tested on 10K rows (I tried a similar approach with the the sort outside the reduce and the result was dismal, as we found previously). – Tom Sharpe Apr 04 '23 at 09:02
2

Here is another option, it is based on ranking by price within each group (brand) formula 1:

=LET(in,A2:C10,A,TAKE(in,,1),C,TAKE(in,,-1),
 r, MAP(A,C,LAMBDA(x,y,SUM((A=x)*(C<y))+1)),FILTER(in,r=1))

Here is the output: output Note: I changed intentionally the price of Ford Expeditor to the same lowest price of Mazda and also interchanged Mazda3 with Tundra for testing purposes, to make sure the order by the brand is not relevant and that it ranks by group properly.

For illustrative purposes, here is the output of r (rank), so we can see how it works: rank output

The good thing about this approach is that with minimal changes, you can change the filter criteria. For example to get the two lowest prices by brand, just filter by 2.

If you want to rank by maximum price instead, then change (C < y) with (C > y), so it is additional flexibility this approach has.

Performance

See comments on VBasic2008 and Tom Sharpe's answers. The above solution (formula 1) for a large dataset of around 10K rows, shows lower performance (54secs) compared to JvdV and Tom solutions (fraction of a second). Being JvdV the fastest one. I didn't consider VBasic2008's solution because we know from other questions/answers that REDUCE/VSTACK has a significant time consumption. Then I found the following approach (formula 2):

=LET(in,A2:C10,A,TAKE(in,,1),C,TAKE(in,,-1), ux, UNIQUE(A),m, 
 MINIFS(C,A,ux),FILTER(in,1-ISNA(XMATCH(A&"_"&C,ux&"_"&m))))

which is easy to understand and very fast. Concatenation may produce false positives, which is why it is a good practice to use a delimiter (_) that is not part of the dataset character set. Check JvdV answer (comment section) to this question: Finding pairs of cells in two columns.

Later I found it is possible to avoid concatenation as follows with a similar performance result as JvdV's solution (but not better than the XMATCH approach) formula 3:

=LET(in,A2:C10,A,TAKE(in,,1),C,TAKE(in,,-1),ux,UNIQUE(A),m,MINIFS(C,A,ux),
 FILTER(in,MMULT(N(A=TOROW(ux))*N(C=TOROW(m)),SEQUENCE(ROWS(m),,,0))=1))

Note: If you look carefully at this approach it ends up in JvdV's approach, but using MMULT instead of BYROW, which may explain why both have a similar performance. I just added the comparison =1, but it is not really necessary.

Here testing result I did for 100K rows:

formula 2 (XMATCH) JvdV (BYROW) formula 3 (MMULT)
0.24secs 0.54secs 0.53secs

Worth to mention that even using BYROW which is a LAMBDA function, that usually is more time-consuming, it is about double, but anyway really fast too.

David Leal
  • 6,373
  • 4
  • 29
  • 56
  • This is also good solution, but i couldn't make out and had to use `Thunks` since I have two different data in different sheets. – Mayukh Bhattacharya Jun 26 '23 at 11:48
  • @MayukhBhattacharya share your Excel file so I can take a look at it. Thanks – David Leal Jun 26 '23 at 12:26
  • 1
    @MayukhBhattacharya I added my solution adapted to your sample data in `F12`, I don't see any difference from the solution in `F2`, other than different sorting (I don't change the order of the input data) and the empty price because it is lower (treated as zero) for Paracetamol Tablet. – David Leal Jun 26 '23 at 13:10
  • 1
    Alright checking now – Mayukh Bhattacharya Jun 26 '23 at 13:11