Find all possible combinations of strings that match a given pattern in JS

Question

So I have a dictionary where each key is mapped to an array of letters:

tCategories = { "T": ["t","d","th"],
                "P": ["p","t","k","q"],
                "N": ["m","n"] };

And an input string that contains a handful of patterns delimited by commas, e.g. "aT,Ps,eNe,NP", where a substring that is a valid key of tCategories acts a stand-in for any of the letters in tCategories[key].

What I'm trying to figure out is how to find every combination of each pattern listed in the input string and put them all in an array. So e.g. the expected output for foo("aT,Ps,eNe,NP") would be ["at","ad","ath","ps","ts","ks","qs","eme","ene","mp","mt","mk","mq","np","nt","nk","nq"].

My first instinct would either be to call String.split(",") on the input string to deal with each substring separately, or else iterate via for (var key in tCategories) { input.replace(new RegExp(key, "g"), "["+tCategories[key].join("|")+"]" }, or something... but I just can't seem to find a useful pathway between those and the expected output. It would involve... what, basically implementing the distributive property but for letters instead of numbers? How do I do this?

do you have some example of given data which is actually not working? how does longer keys look like, and what is the corresponding data for it? and what result do you expect? what about lower an uppercase letters? do they have a meaning? — Nina Scholz, Nov 03 '21 at 13:23
Can a key in the category dictionary be a substring of another key? { "approx": [], "approximate": [] } — NSaran, Nov 04 '21 at 01:54

Nina Scholz · Answer 1 · 2021-11-04T07:40:51.260

For the original answer see below.

Updated Answer

This answer works with a recursion and collects groups, like

a[Ps,T]

which creates a new category (Ps-T) by replacing the brackets and commas and takes the result of

Ps,T
ps ts ks qs t d th

This works as well for nested brackets. The order of relacements works from inside to the outer brackets.

With this change, it is necessary to accept longer keys than only one character. Now it searches for the longest key to the smallest key. If no key exists, it takes a single letter for the cartesian preparation.

function convert(string, tCategories) {
    const cartesian = (a, b) => a.reduce((r, v) => r.concat(b.map(w => [].concat(v, w))), []);

    let change;

    do {
        change = false;
        string = string.replace(/\[([^\[\]]+)\]/g, (_, p) => {
            const key = `(${p.replace(/,/g, '-')})`;
            tCategories[key] = convert(p, tCategories);
            change = true;
            return key;
        });
    } while (change);

    return string
        .split(/,/)
        .map(s => {
            const
                keys = Object.keys(tCategories).sort((a, b) => b.length - a.length),
                result = [];

            while (s.length) {
                const sub = keys.find(k => s.startsWith(k));
                if (sub) {
                    result.push(tCategories[sub]);
                    s = s.slice(sub.length);
                } else {
                    result.push([s[0]]);
                    s = s.slice(1);
                }
            }
            while (result.length < 2) result.push(['']);
            return result;
        })
        .map(a => a.reduce(cartesian).map(a => a.join('')))
        .flat();
}

const
    tCategories = { T: ["t", "d", "th"], P: ["p", "t", "k", "q"], N: ["m", "n"], Approx: ["j", "w"] };

console.log(convert('Ps,T', { ...tCategories }));
console.log(convert('a[Ps,T]', { ...tCategories }));
console.log(convert('a[Ps,T[aPbcApprox]],eNe,NP', { ...tCategories }));
console.log(convert("V,eNe,a[Ps,T],,ApproxT", { ...tCategories }));

.as-console-wrapper { max-height: 100% !important; top: 0; }

Original Answer

You could split the string by comma, replace the groups with their arrays and replace a single character with the characters in an array, get the cartesian product, join the inner arrays and get the array with the result.

Finally flat the array.

const 
    cartesian = (a, b) => a.reduce((r, v) => r.concat(b.map(w => [].concat(v, w))), []),
    foo = string => string
        .split(',')
        .map(s => Array.from(s, c => tCategories[c] || [c]))
        .map(a => a.reduce(cartesian).map(a => a.join('')))
        .flat(),
    tCategories = { T: ["t", "d", "th"], P: ["p", "t", "k", "q"], N: ["m", "n"] };

console.log(...foo("aT,Ps,eNe,NP"));

I still have can't figure out how `cartesian` works and I had to define `Array.prototype.flat` (I guess it's not in vanilla JS?) but I think I understand the rest of it and it works like a charm, thanks. — Arcaeca, Jan 18 '21 at 20:05
with `cartesian` as callback for reduce, you get an array of arrays with the cartesian product. please have a look here: https://stackoverflow.com/a/50631472/1447675 — Nina Scholz, Jan 18 '21 at 20:13
The updated answer does not satisfy the 2nd edge case: an input string with an empty subpattern, e.g. convert("V,eNe,a[Ps,T],,ApproxT"), still throws `Uncaught TypeError: Reduce of empty array with no initial value` at `Array.reduce` instead of giving an empty string in the output array. — Arcaeca, Nov 04 '21 at 04:37
@Arcaeca, please see edit. empty strings return an empty strings in the result set. — Nina Scholz, Nov 04 '21 at 08:31

Ahmed Hany · Answer 2 · 2021-11-04T05:17:00.677

This is an update regarding the bounty by @Arcaeca who asked for 3 things:

1- The line .map(s => Array.from(s, c => tCategories[c] || [c])) does not replace a key of tCategories with its corresponding value when key.length > 1.

2- Passing an input string with an empty subpattern (i.e. substring delimited by ","), e.g. "aT,Ps,eNe,,NP", causes the function to throw: TypeError.

3- It's a new feature, I tried to add was the ability to define "nonce" categories on the spot by enclosing them in square brackets [ ], e.g. the input string "a[Ps,T]" should yield the same output as "aPs,aT"

My Answer (From @Nina Scholz Answer)

I will start with the third requirement since it's completely new, so to make it easy I will make another function to parse the given string and check if it has square brackets multiplication, then resolve it, e.g. the input "a[Ps,T]", the output would be "aPs,aT" e.g the input "a[T, X]d", the output would be "aTd, aXd" I will call this clean(). You can enhance this function as you want.

const clean = string => {
    while (true) {
        const patterns = [
            /(\w+)\[([\w+\,]*)\](\w+)*/,
            /(\w+)*\[([\w+\,]*)\](\w+)/
        ]
        let match = null
        for (const i in patterns) {
            match = patterns[i].exec(string)
            if (match) {
                break;
            }
        }
        if (!match) {
            break
        }
        const newString = [match[1] ? [match[1]] : [''], match[2].split(',').map(v => v.replace(',', '')), match[3] ? [match[3]] : ['']].reduce(cartesian).map(a => a.join('')).join(',')
        string = string.replace(match[0], newString)
    }
    return string
};

Backing to the first two requirements, I made this modification

const foo = string => Object.keys(tCategories)
    .reduce((a, b) => a.replaceAll(b, `?${b}?`), string)
    .split(',')
    .map(v => v.split('?').map(t => tCategories[t] || [[t]]))
    .map(a => a.reduce(cartesian).map(a => a.join('')))
    .flat()

What I did is, I went through each key of tCategories then checked if my string contains that key, if yes then put a placeholder around it to make it easy to identify it, in my example, I chose ?, and got rid of Array.from method. now our function supports keys whose length > 1, and also empty subpatterns.

Full Example

let tCategories = { T: ["t", "d", "th"], P: ["p", "t", "k", "q"], N: ["m", "n"], KK: ['a', 'b'] };

const cartesian = (a, b) => a.reduce((r, v) => r.concat(b.map(w => [].concat(v, w))), []);

const clean = string => {
    while (true) {
        const patterns = [
            /(\w+)\[([\w+\,]*)\](\w+)*/,
            /(\w+)*\[([\w+\,]*)\](\w+)/
        ]
        let match = null
        for (const i in patterns) {
            match = patterns[i].exec(string)
            if (match) {
                break;
            }
        }
        if (!match) {
            break
        }
        const newString = [match[1] ? [match[1]] : [''], match[2].split(',').map(v => v.replace(',', '')), match[3] ? [match[3]] : ['']].reduce(cartesian).map(a => a.join('')).join(',')
        string = string.replace(match[0], newString)
    }
    return string
};

const foo = string => Object.keys(tCategories)
    .reduce((a, b) => a.replaceAll(b, `?${b}?`), string)
    .split(',')
    .map(v => v.split('?').map(t => tCategories[t] || [[t]]))
    .map(a => a.reduce(cartesian).map(a => a.join('')))
    .flat()

console.log(...foo(clean('aT,Ps,eNe,NP,,KK[z,c,f]')))

This solution is giving me some... odd... results where the input includes something directly after a closing paranthesis. foo("[Approx,k]T") returns ['[j', '[w', 'k]t', 'k]d', 'k]n'] - the brackets are being included in the combinations? - and foo("a[Ps,T[b,c]]d") returns ['aps', 'ats', 'aks', 'abs', 'ads', 'ags', 'atb', 'adb', 'anb', 'atcd', 'adcd', 'ancd'] - the pattern implies all combinations should end in "d". Any idea how to fix these? Otherwise works pretty well, including nested brackets. — Arcaeca, Nov 04 '21 at 04:31
ok your problem with `clean()`, but I don't understand the first example `foo(clean("[Approx,k]T"))` returns `['[j', '[w', 'k]t', 'k]d', 'k]n']`, How???, please write what you expect not the output of my code to not confuse. otherwise, I got your second example `foo(clean('a[Ps,T[b,c]]d'))` should return `['aps', 'ats', 'aks', 'abs', 'ads', 'ags', 'atb', 'adb', 'anb', 'atcd', 'adcd', 'ancd']`, it's very clear to me — Ahmed Hany, Nov 04 '21 at 04:53
The expected output of `foo(clean("[Approx,k]T"))` would be `['jt','jd','jth','wt','wd','wth','kt','kd','kth']`. Also, I moved the `clean(...)` call into the body of `foo`, so that the first thing `foo(string)` does is automatically call `clean(string)`. That way I just have to call `foo(...)` instead of `foo(clean(...))` every time. Sorry if that was confusing. — Arcaeca, Nov 04 '21 at 05:04
Also to clarify: The expected output of `foo(clean('a[Ps,T[b,c]]d'))` would be `['apsd', 'atsd', 'aksd', 'aqsd', 'atbd', 'adbd', 'athbd', 'atcd', 'adcd', 'athcd']`. Most of the combinations in the current output are missing the final 'd'. — Arcaeca, Nov 04 '21 at 05:12

NSaran · Accepted Answer · 2021-11-04T17:55:29.720

Original Question:

   const tCategories = {
      "T": ["t","d","th"],
      "P": ["p","t","k","q"],
      "N": ["m","n"],
    };
    
    // Think matrix like multiplication
    function multiply(twoDArray1, twoDArray2) {
      const product = [];
      for (let i = 0; i < twoDArray1.length; i++) {
        for (let j = 0; j < twoDArray2.length; j++) {
          product.push([...twoDArray1[i], twoDArray2[j]]);
        }
      }
      return product;
    }
    
    function stringHasCategories(inputString) {
      for (let i = 0, ch = inputString.charAt(0); i < inputString.length; i++, ch = inputString.charAt(i)) {
        if (tCategories[ch]) {
          return true;
        }
      }
      return false;
    }
                        
    function expandCategories(inputString) {
      if (!stringHasCategories(inputString)) {
        return inputString;
      }
      let output = [[]];
      for (let i = 0, ch = inputString.charAt(0); i < inputString.length; i++, ch = inputString.charAt(i)) {
         if (tCategories[ch]) {
           output = multiply(output, tCategories[ch]);
         } 
         else {
           output.forEach((op) => op.push(ch));
         }
      }
      output.forEach((op, i) => { output[i] = op.join(''); });
      return output;
    }
                        
    function foo(inputString = "aT,Ps,eNe,NP") {
      return inputString
        .split(',')
        .map(expandCategories)
        .flat();
    }
    
    console.log(foo());

For Updated Question:

https://gist.github.com/EarthyOrange/1f9ca9ae606b61d435fef484bbf96945

>I want to know what the expectation is if one key is a substring of another. As in, if `tCategories` has both a key "A" and "Approx"? Then default to the longest key matched, so that, say, if tCategories["A"] = ["a","b","c"], the expected output of foo("Approx") would still be ["j,w"], not ["approx","bpprox","cpprox"]. But unless I've implemented it wrong in my test environment, "Approx" is still being treated like a string literal, with the input foo("[Approx,k]T") returning ['Approxt', 'Approxd', 'Approxn', 'kt', 'kd', 'kn'], so it doesn't satisfy the 1st edge case. — Arcaeca, Nov 04 '21 at 04:26
so this almost works as expected, but it throws `Uncaught TypeError: op.join is not a function` if the input string ends in a key of length > 1 - try e.g. `foo("Approx")` vs. `foo("Approxk")` and see. However, I assume that can be fixed. Yours and @AhmedHany 's solutions both work very well, but yours is _substantially_ faster for a large number of function calls: if I call a complicated input string like `foo("a[Ps,T[b,c]]d,aT,Ps,eNe,,NP,[Approx,P]T,ApproximateApprox")` 10,000 times, it takes me about ~500 ms with your solution vs. ~6000 ms for Ahmed, so I'm giving you the bounty. — Arcaeca, Nov 05 '21 at 00:23
Great catch! I have updated the gist to fix that case. Thanks for the bounty :) — NSaran, Nov 06 '21 at 21:34

Find all possible combinations of strings that match a given pattern in JS

3 Answers3

Updated Answer

Original Answer