5

I am still playing around for one project with matching words.

Let assume that I have a given string, say maxmuster . Then I want to mark this part of my random word maxs which are in maxmuster in the proper order, like the letters are.

I wil give some examples and then I tell what I already did. Lets keep the string maxmuster. The bold part is the matched one by regex (best would be in php, however could be python, bash, javascript,...)

maxs

Mymaxmuis

Lemu

muster

Of course also m, u, ... will be matched then. I know that, I am going to fix that later. However, the solution, I though, should not so difficult, so I try to divide the word in groups like this:

/(maxmuster)?|(maxmuste)?|(maxmust)?|(maxmus)?|(maxmu)?|(maxm)?|(max)?|(ma)?|(m)?/gui

But then I forgot of course the other combinations, like:

(axmuster)(xmus) and so on. Did I really have to do that, or exist there a simple regex trick, to solve this question, like I explained above?

Thank you very much

Allan Karlson
  • 453
  • 11
  • 23
  • `maxs` and `muster` should be matched? I would have thought `m` is required, then `ma`, `max`, `maxm`, etc. Not an error that will cause false positives but `(ma)?|(ma)?` is a duplicate. – chris85 Apr 27 '17 at 18:46
  • This is not regex only task. Though you can use regex+programming to solve this. – Rahul Apr 27 '17 at 18:57
  • This matches single characters, which is not what @Allan Karlson wants, as far as I understand. My first thought was `(m?a?x?m?u?s?t?e?r?)`, but that matches `mtr`, for example. – Imanuel Apr 27 '17 at 19:15
  • yes exactly, that does not work, you understand me right :). @chris85, sorry that was a typo wrong from my site, I wanted to write only on time "(ma)?" – Allan Karlson Apr 27 '17 at 19:29

3 Answers3

2

Sounds like you need string intersection. If you don't mind non regex idea, have a look in Wikibooks Algorithm Implementation/Strings/Longest common substring PHP section.

foreach(["maxs", "Mymaxmuis", "Lemu", "muster"] AS $str)
  echo get_longest_common_subsequence($str, "maxmuster") . "\n";

max
maxmu
mu
muster

See this PHP demo at tio.run (caseless comparison).


If you need a regex idea, I would join both strings with space and use a pattern like this demo.

(?=(\w+)(?=\w* \w*?\1))\w

It will capture inside a lookahead at each position before a word character in the first string the longest substring that also matches the second string. Then by PHP matches of the first group need to be sorted by length and the longest match will be returned. See the PHP demo at tio.run.

function get_longest_common_subsequence($w1="", $w2="")
{
  $test_str = preg_quote($w1,'/')." ".preg_quote($w2,'/');

  if(preg_match_all('/(?=(\w+)(?=\w* \w*?\1))\w/i', $test_str, $out) > 0)
  {
    usort($out[1], function($a, $b) { return strlen($b) - strlen($a); });
    return $out[1][0];
  }
}
bobble bubble
  • 16,888
  • 3
  • 27
  • 46
  • wow that looks great, that is exactly what it should do, thank you very much! – Allan Karlson Apr 27 '17 at 19:34
  • ok I just see, but I guess I should fix that, that I have to make it somehow case insensitive. – Allan Karlson Apr 27 '17 at 19:43
  • You're welcome @AllanKarlson might want to use `strtolower` on all the strings before comparing if desired. – bobble bubble Apr 27 '17 at 19:44
  • Yes I just tested it but then I have to convert the uppercase letter back. I know there is something with `strtoupper()`. I hope it works then but however the words are matched that is the important part :D – Allan Karlson Apr 27 '17 at 19:47
  • 1
    @AllanKarlson I think you can do the comparsion case insensitive by switching `if ($string_1[$i] === $string_2[$j])` to eg `if (strtolower($string_1[$i]) === strtolower($string_2[$j]))` just play around a bit :) good luck – bobble bubble Apr 27 '17 at 19:50
  • 1
    @AllanKarlson Just [another demo](https://eval.in/784024) with caseless comparison. – bobble bubble Apr 27 '17 at 20:00
  • @AllanKarlson Years later - while writing an answer to another question - I got a simple regex idea and wonder if that also would work. However I updated my answer and added it :) – bobble bubble Dec 20 '19 at 15:15
  • Great, this look very nice. Yes, years later, this project is solved, however I will keep that in mind :) – Allan Karlson Dec 21 '19 at 10:19
1

TL;DR

Using Regular Expressions:

longestSubstring(['Mymaxmuis', 'axmuis', 'muster'], buildRegexFrom('maxmuster'));

Full snippet


Using below regex you are able to match all true sub-strings of string maxmuster:

(?|((?:
    m(?=a)
    |(?<=m)a
    |a(?=x)
    |(?<=a)x
    |x(?=m)
    |(?<=x)m
    |m(?=u)
    |(?<=m)u
    |u(?=s)
    |(?<=u)s
    |s(?=t)
    |(?<=s)t
    |t(?=e)
    |(?<=t)e
    |e(?=r)
    |(?<=e)r
)+)|([maxmuster]))

Live demo

You have to cook such a regex from a word like maxmuster so you need a function to call it:

function buildRegexFrom(string $word): string {
    // Split word to letters
    $letters = str_split($word);
    // Creating all side of alternations in our regex
    foreach ($letters as $key => $letter)
        if (end($letters) != $letter)
            $regex[] = "$letter(?={$letters[$key + 1]})|(?<=$letter){$letters[$key + 1]}";
    // Return whole cooked pattern
    return "~(?|((?>".implode('|', $regex).")+)|([$word]))~i";
}

To return longest match you need to sort results according to matches length from longest to shortest. It means writing another piece of code for it:

function longestSubstring(array $array, string $regex): array {
    foreach ($array as $value) {
        preg_match_all($regex, $value, $matches);
        usort($matches[1], function($a, $b) {
            return strlen($b) <=> strlen($a);
        });
        // Store longest match being sorted
        $substrings[] = $matches[1][0];
    }

    return $substrings;
}

Putting all things together:

print_r(longestSubstring(['Mymaxmuis', 'axmuis', 'muster'], buildRegexFrom('maxmuster')));

Outputs:

Array
(
    [0] => maxmu
    [1] => axmu
    [2] => muster
)

PHP live demo

Community
  • 1
  • 1
revo
  • 47,783
  • 14
  • 74
  • 117
0

Here is my take on this problem using regex.

<?php
$subject="maxmuster";
$str="Lemu";

$comb=str_split($subject); // Split into single characters.
$len=strlen($subject);

for ($i=2; $i<=$len; $i++){
    for($start=0; $start<$len; $start++){
        $temp="";
        $inc=$start;
        for($j=0; $j<$i; $j++){
            $temp=$temp.$subject[$inc];
            $inc++;
        }
        array_push($comb,$temp);
    }
}

echo "Matches are:\n";
for($i=0; $i<sizeof($comb); $i++){
    $pattern = "/".$comb[$i]."/";
    if(preg_match($pattern,$str, $matches)){
        print_r($matches);  
    };
}
?>

And here is an Ideone Demo.

Rahul
  • 2,658
  • 12
  • 28
  • Although it's executing on Ideone it's throwing an `Uninitialized string offset: 9` on my localhost. Not able to figure out why. – Rahul Apr 27 '17 at 20:43
  • 1
    thanks but I also habe a offset error, however I will look into that. – Allan Karlson Apr 28 '17 at 06:47
  • @AllanKarlson: I am getting the same. But it works fine on Ideone. If you correct it let me know. – Rahul Apr 28 '17 at 07:28