How to split a search string into parts, then check parts against a database

Question

Here's what I'm dealing with:

We have a database of machines and their part lists are specified using strings. For example, one machine might be specified with the string &XXX&YYY-ZZZ, meaning the machine contains parts XXX and YYY and not ZZZ.
We use &XXX to specify that a part exists in a machine, and -XXX to specify that a part does not exist in a machine.
It's also possible that a part is not listed (i.e. not specified whether or not it exists in the machine). For example I might only have &XXX&YYY (ZZZ is not specified).
Additionally, the codes can be in any order, for example I might have &XXX&YYY-ZZZ or &XXX-ZZZ&YYY.

In order to search for machines, I get a string like this: &XXX-YYY/&YYY&ZZZ (/ is an OR operator), meaning "I want to find all machines that either a) contain XXX and do not contain YYY, or b) contain both YYY and ZZZ.

I'm having trouble parsing the string based on the variable ordering, possibility that parts may not be shown, and handling of the / operator. Note, we use Microsoft 365.

Looking for some suggestions!

When I search for &XXX-YYY/&YYY&ZZZ, I should return the following machines:

Machine	Result
`&XXX-YYY&ZZZ`	`TRUE` (because `XXX` exists and `YYY` does not exist)
`&XXX-YYY-ZZZ`	`TRUE` (because `XXX` exists and `YYY` does not exist)
`&XXX&YYY&ZZZ`	`TRUE` (because `YYY` exists and `ZZZ` exists)
`&XXX&ZZZ`	`FALSE` (because `YYY` is specified in the search, but this machine doesn't specify it)
`&ZZZ&YYY`	`TRUE` (showing that parts can be in any order)

it seems you want to define a parser in Excel and it is not intended for that, there are tools/languages more appropriate to build a such a parser. If it is something simple it can be done. For example, the tokens we can consider are only `XXX`, `YYY`, `ZZZ`, if not, is there any pattern to identify them? — David Leal, Nov 18 '22 at 20:04
I provided a solution based on my understanding of the problem to solve. The solution doesn't relay on token patterns (it can be any string prefixed by the operation specified), but rather on the operation type, to identify each element. Please check it and let me know. Thanks — David Leal, Nov 19 '22 at 01:20

David Leal · Answer 1 · 2022-11-23T04:28:16.737

You can try it in cell C2 with the following formula:

=LET(query, A2, queries, TEXTSPLIT(query,, "/"), input, B2:B7, 
  qryNum, ROWS(queries),
  SPLIT, LAMBDA(txt,LET(str, SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),
    "-",";0_"), TEXTSPLIT(str,,";",TRUE))),
  lkUps, DROP(REDUCE("", queries, LAMBDA(acc,qry, HSTACK(acc, SPLIT(qry)))),,1),
  MAP(input, LAMBDA(txt, LET(str, SPLIT(txt),
    out, REDUCE("", SEQUENCE(qryNum, 1), LAMBDA(acc,idx, 
    LET(cols, INDEX(lkUps,,idx), qry, FILTER(cols, cols<>""), 
      matches, SUM(N(ISNUMBER(XMATCH(str, qry)))), 
      result, IF(ROWS(qry)=matches,1,0),IF(acc="", result, MAX(acc, result))
    ))), IF(out=1, TRUE, FALSE)
  )))
)

and here is the corresponding output:

Assumptions:

String values (operation and part) should be unique, i.e. the case &XXX-YYY&XXX is not considered, because &XXX is duplicated.

Explanation

The main idea is to transform the input information in a way we can do comparisons at array level via XMATCH. The first thing to do is to identify each OR condition in the search string because we need to test each one of them against the Input column. The name queries is an array with all the OR conditions.

We can transform the string inputs in a way we can split the string into an array. SPLIT is a user LAMBDA function that does that:

SUBSTITUTE(SUBSTITUTE(txt, "&",";1_"),"-",";0_"), TEXTSPLIT(str,,";",TRUE)))

What it does is convert for example the input: &XXX-YYY&ZZZ into the following array:

1_XXX
0_YYY
1_ZZZ

We change the original operations &,- into 1,0 just for convenience, but you can keep the original operation value, it is irrelevant for the calculation. It is important to set the fourth TEXTSPLIT input argument to TRUE to ensure no empty rows are generated.

The name lkUps is an array with all the OR conditions organized by column for query. In the format we want, for example:

1_XXX   1_YYY
0_YYY   1_ZZZ

Note: For creating lkUps we use the pattern: DROP/REDUCE/HSTACK, for more information about it, check the answer to the question: how to transform a table in Excel from vertical to horizontal but with different length provided by @DavidLeal.

Now we have all the elements we need to build the recurrence. We use MAP to iterate over all Input column values. For each element (txt) we transform it to the format of our convenience via SPLIT user LAMBDA function and name it str.

We use REDUCE function inside MAP to iterate over all columns of lkUps to check against str. We use SEQUENCE(qryNum, 1) as input of REDUCE to be able to iterate over each lkUps column (qry).

Now we are going to use the above variables in XMATCH and name the variable matches as follows:

SUM(N(ISNUMBER(XMATCH(str, qry))))

If all values from qry were found in str then we have a match. If that is the case the item of the SUM will be 1, otherwise 0. Therefore the SUM for the match case should be of the same size as qry.

Because we include in the XMATCH both parts and operations (1,0), we ensure that not just the same parts are found, but also their corresponding operations are the same. The order of the parts is not relevant, XMATCH ensures it.

The REDUCE recurrence keeps the maximum value from the previous iteration (previous OR condition). We just need at least one match among all OR conditions. Therefore once we finish all the recurrence, if the result value of REDUCE is 1 at least one match was found. Finally, we transform the result into a TRUE/FALSE.

Note: For a large list of operations instead of using the above approach of two SUBSTITUTE calls. The SPLIT function can be defined as follow:

LAMBDA(txt,tks, LET(seq, SEQUENCE(COLUMNS(tks),1),
  out, REDUCE("", seq, LAMBDA(acc,idx, LET(str, IF(acc="", txt, acc),
    SUBSTITUTE(str, INDEX(tks,1,idx), INDEX(tks,2,idx))))),
  TEXTSPLIT(out,,";",TRUE)))

and the input tks (tokens) can be defined as follow: {"&","-";"1_", "0_"}, i.e. in the first row old values and in the second row the new values.

How to split a search string into parts, then check parts against a database

1 Answers1

Explanation