Counting if part of string is within interval

Question

I am currently trying to check if a number in a comma-separated string is within a number interval. What I am trying to do is to check if an area code (from the comma-separated string) is within the interval of an area.

The data: AREAS

Area interval	Name	Number of locations
1000-1499	Area 1	?
1500-1799	Area 2	?
1800-1999	Area 3	?

GEOLOCATIONS

Name	Areas List
Location A	1200, 1400
Location B	1020, 1720
Location C	1700, 1920
Location D	1940, 1950, 1730

The result I want here is the number of unique locations in the "Areas list" within the area interval. So Location D should only count ONCE in the 1800-1999 "area", and the Location A the same in the 1000-1499 location. But location B should count as one in both 1000-1499 and one in 1500-1799 (because a number from each interval is in the comma-separated string in "Areas list"):

Area interval	Name	Number of locations
1000-1499	Area 1	2
1500-1799	Area 2	3
1800-1999	Area 3	2

How is this possible?

I have tried with a COUNTIFS, but it doesnt seem to do the job.

Can 'Area List' contain concatenated string longer than just two areas? — JvdV, Aug 30 '22 at 19:09
@TomSharpe Yeah of course - that was a mistake from my side. Sorry! I have made an edit accordingly :) — denlau, Aug 30 '22 at 19:28
I'm struggling to understand why you have 3 locations in area 2. There are only two numbers that are in the range 1500-1799 which are 1720 and 1700, meaning locations B and C, just two unique locations.....right? — JvdV, Aug 30 '22 at 19:45
@JvdV You are absolutely right, that was a mistake in the data. Now I have changed it. I have 1730 in Location D, which is why I expect 3. :) — denlau, Aug 30 '22 at 20:00
@RonRosenfeld - I am using latest version from Microsoft Office 365. I am not exactly sure which version that is. :) — denlau, Aug 30 '22 at 20:01
Sorry, yet another question - is the data currently in tables or just in columns? — Tom Sharpe, Aug 30 '22 at 20:04
@TomSharpe I think it is just in columns - so "pure" vanilla Excel setup, but I can change it if necessary. :) — denlau, Aug 30 '22 at 20:06

JvdV · Answer 1 · 2022-08-31T07:23:08.417

Here is one option using FILTERXML():

Formula in C2:

=SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]"))

Where:

"<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>" - Is the part where we construct a valid piece of XML. The theory here is that we use three axes here. Each t-node will be named a literal 1 to make sure that once we return them with xpath we can sum the result. The outer x-nodes are there to make sure Excel will handle the inner axes correctly. If you are curious to know how this xml-syntax looks at the end, it's best to step through using the 'Evaluate Formula' function on the Data-tab;
//t[count(.//*[.>="&SUBSTITUTE(A2,"-","][.<=")&"])>0]")) - Basically means that we collect all t-nodes where the count of child s-nodes that are >= to the leftmost number and <= to the rightmost number is larger than zero. For A2 the xpath would look like //t[count(.//*[.>=1000][.<=1499])>0]")) after substitution. In short: //t - Select t-nodes, where count(.//* select all child-nodes where count of nodes that fullfill both requirements [.>=1000][.<=1499] is larger than zero;
Since all t-nodes equal the number 1, the SUM() of these t-nodes equals the amount of unique locations that have at least one area in its Areas List;
Important to note that FILTERXML() will result into an error if no t-nodes could be found. That would mean we need to wrap the FILTERXML() in an IFERROR(...., 0) to counter that and make the SUM() still work correctly.

Or, wrap the above in BYROW():

Formula in C2:

=BYROW(A2:A4,LAMBDA(a,SUM(FILTERXML("<x><t>"&TEXTJOIN("</s></t><t>",,"1<s>"&SUBSTITUTE(B$7:B$10,", ","</s><s>"))&"</s></t></x>","//t[count(.//*[.>="&SUBSTITUTE(a,"-","][.<=")&"])>0]"))))

This is so complicated. Do you know a (video)Tutorial where the first solution is explained? Not even the filterxml function I understand the way you use it. It seems you have two containers inside. — WeAreOne, Aug 31 '22 at 06:46
@Prema, I don't know about a video, but maybe [this](https://stackoverflow.com/q/61837696/9758194) older post of mine would help you understand better. But yes, there are two axes here. One relation from parent `x` to child `t` which has it's own childnodes `s`. If it helps I could try to explain more in depth in the post itself? — JvdV, Aug 31 '22 at 06:49
Thank you for the link and the explanation. It is a lot to go thorough, but worth it, because it is so useful. Thank you! — WeAreOne, Aug 31 '22 at 09:33

score 2 · Answer 2 · answered Aug 30 '22 at 20:11

2

Using MMULT and TEXTSPLIT:

=LET(rng,TEXTSPLIT(D2,"-"),
    tarr,IFERROR(--TRIM(TEXTSPLIT(TEXTJOIN(";",,$B$2:$B$5),",",";")),0),
    SUM(--(MMULT((tarr>=--TAKE(rng,,1))*(tarr<=--TAKE(rng,,-1)),SEQUENCE(COLUMNS(tarr),,1,0))>0)))

answered Aug 30 '22 at 20:11

Scott Craner

148,073
10
49
81

score 2 · Answer 3 · answered Aug 30 '22 at 20:18

I am in very distinguished company but will add my version anyway as byrow probably is a slightly different approach

=LET(range,B$2:B$5,
lowerLimit,--@TEXTSPLIT(E2,"-"),
upperLimit,--INDEX(TEXTSPLIT(E2,"-"),2),
counts,BYROW(range,LAMBDA(r,SUM((--TEXTSPLIT(r,",")>=lowerLimit)*(--TEXTSPLIT(r,",")<=upperLimit)))),
SUM(--(counts>0))
)

score 1 · Answer 4 · answered Aug 30 '22 at 20:42

Here the ugly way to do it, with A LOT of helper columns. But not so complicated

F4= =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(B4;",";"</r><r>")&"</r></m>";"//r"))
F11=    =TRANSPOSE(FILTERXML("<m><r>"&SUBSTITUTE(A11;"-";"</r><r>")&"</r></m>";"//r"))
F16=    =SUM(F18:F21)
F18=    =IF(SUM(($F4:$O4>=$F$11)*($F4:$O4<=$G$11))>0;1;"")
G18=    =IF(SUM(($F4:$O4>=$F$12)*($F4:$O4<=$G$12))>0;1;"")
H18=    =IF(SUM(($F4:$O4>=$F$13)*($F4:$O4<=$G$13))>0;1;"")

Counting if part of string is within interval

The data: AREAS

GEOLOCATIONS

4 Answers4