3

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.

The data: AREAS

Area interval Name Number of locations
1000-1499 Area 1 ?
1500-1799 Area 2 ?
1800-1999 Area 3 ?

GEOLOCATIONS

Name Areas List
Location A 1200, 1400
Location B 1020, 1720
Location C 1700, 1920
Location D 1940, 1950, 1730

The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):

Area interval Name Number of locations
1000-1499 Area 1 2
1500-1799 Area 2 3
1800-1999 Area 3 2

How is this possible?

I have tried with a COUNTIFS, but it doesnt seem to do the job.

denlau
  • 916
  • 2
  • 9
  • 21

4 Answers4

5

Here is one option using FILTERXML():

enter image description here

Formula in C2:

=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))

Where:

  • "<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
  • //t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
  • Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
  • Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.

Or, wrap the above in BYROW():

enter image description here

Formula in C2:

=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))
JvdV
  • 70,606
  • 8
  • 39
  • 70
  • This is so complicated. Do you know a (video)Tutorial where the first solution is explained? Not even the filterxml function I understand the way you use it. It seems you have two containers inside. – WeAreOne Aug 31 '22 at 06:46
  • 1
    @Prema, I don't know about a video, but maybe [this](https://stackoverflow.com/q/61837696/9758194) older post of mine would help you understand better. But yes, there are two axes here. One relation from parent `x` to child `t` which has it's own childnodes `s`. If it helps I could try to explain more in depth in the post itself? – JvdV Aug 31 '22 at 06:49
  • @Prema, I edited the post to explain a little bit more. – JvdV Aug 31 '22 at 07:10
  • 1
    Thank you for the link and the explanation. It is a lot to go thorough, but worth it, because it is so useful. Thank you! – WeAreOne Aug 31 '22 at 09:33
2

Using MMULT and TEXTSPLIT:

=LET(rng,TEXTSPLIT(D2,"-"),
    tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
    SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))

enter image description here

Scott Craner
  • 148,073
  • 10
  • 49
  • 81
2

I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach

=LET(range,B$2:B$5,
lowerLimit,--@TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)

enter image description here

Tom Sharpe
  • 30,727
  • 5
  • 24
  • 37
1

Here the ugly way to do it, with A LOT of helper columns. But not so complicated

F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11=    =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16=    =SUM(F18:F21)
F18=    =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18=    =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18=    =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

enter image description here

WeAreOne
  • 1,310
  • 1
  • 2
  • 13