7

I was wondering if there was a way to do pattern matching in Octave / matlab? I know Maple 10 has commands to do this but not sure what I need to do in Octave / Matlab. So if a number was 12341234123412341234 the pattern match would be 1234. I'm trying to find the shortest pattern that upon repetiton generates the whole string.

Please note: the numbers (only numbers will be used) won't be this simple. Also, I won't know the pattern ahead of time (that's what I'm trying to find). Please see the Maple 10 example below which shows that the pattern isn't known ahead of time but the command finds the pattern.

Example of Maple 10 pattern matching:

ns:=convert(12341234123412341234,string);

             ns := "12341234123412341234"

StringTools:-PrimitiveRoot(ns);

             "1234"

How can I do this in Octave / Matlab? Ps: I'm using Octave 3.8.1

Robert Seifert
  • 25,078
  • 11
  • 68
  • 113
Rick T
  • 3,349
  • 10
  • 54
  • 119
  • 1
    Nobody is going to look up the Maple example. Please provide the relevant information in the question. For example, you should add that the command should find the pattern for you... – kkuilla Mar 10 '15 at 12:55
  • is the length of the pattern known and fixed? and does the input string only contains the pattern or also other characters? – Robert Seifert Mar 10 '15 at 14:37
  • @thewaywewalk No, the length of the pattern isn't known. The string is only numbers. Please note: the numbers (only numbers will be used) won't be this simple nor will I know it's pattern ahead of time (that's what I'm trying to find). – Rick T Mar 10 '15 at 14:41
  • 1
    my question is: could the string be `1234912348123471234612345` and would the desired pattern to find also be `1234`? – Robert Seifert Mar 10 '15 at 14:58
  • @thewaywewalk yes the desired pattern to find would still be 1234 – Rick T Mar 10 '15 at 15:02
  • 1
    Well, it's still not clear. If there are multiple patterns possible, which one you want detected: the longest one? the pattern with the most occurences? what would be the minimal length then? Imagine `1234123412341234123` than you would find `1234` four times, but `123` five times and actually `1234123412341234123` is also a pattern. Please provide a much better example, which covers all special cases. – Robert Seifert Mar 10 '15 at 15:06
  • @thewaywewalk sorry if I wasn't clear yes I want the longest pattern possible. So if a number was 12341234123412341234 the pattern match would be 1234. I'll type it out in words instead of just showing the numbers example and add it to the question – Rick T Mar 10 '15 at 15:15
  • 3
    But the longest pattern possible of '12341234123412341234 ' **is** '12341234123412341234 ' and the second longest is '1234123412341234123 ' - so if you state the your real input is not as simple as your example, please post your real input :) – Robert Seifert Mar 10 '15 at 15:23
  • @thewaywewalk the whole question is about not knowing the pattern of numbers and finding one. I gave an example along with the answer with numbers and with written text. Your asking me to tell you all possible combinations there might be...if I knew that I would be working on wall street :-) – Rick T Mar 10 '15 at 15:34
  • I think the point @thewaywewalk is making is that, if we allow other words to delimit the pattern you are looking for, the solution is ambiguous: do we match only `1234`, or do we match all of `1234912348123471234612345`? How is the program supposed to know which is the better result? – eigenchris Mar 10 '15 at 15:54
  • @eigenchris yes we only match 1234. How to match it with just 1234, I wish I knew how maple 10 does it with just 2 lines of code. – Rick T Mar 10 '15 at 15:59
  • @RickT Please be aware that the you will lose your reps for this question if there are more than 10 edits so don't do any more edits. – kkuilla Mar 10 '15 at 16:55
  • @kkuilla I've stopped editing but if you take a look others are making really good changes :-) – Rick T Mar 10 '15 at 17:06
  • @RickT It was just to make you aware (in case you didn't know) that one more edit and you will lose the reps.. – kkuilla Mar 10 '15 at 17:10

4 Answers4

11

To find the shortest pattern that upon repetition generates the whole string, you can use regular expressions as follows:

result = regexp(str, '^(.+?)(?=\1*$)', 'match');

Some examples:

>> str = '12341234123412341234';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234'

>> str = '1234123412341234123';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    '1234123412341234123'

>> str = 'lullabylullaby';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby'

>> str = 'lullaby1lullaby2lullaby1lullaby2';
>> result = regexp(str, '^(.+?)(?=\1*$)', 'match')
result = 
    'lullaby1lullaby2'
Luis Mendo
  • 110,752
  • 13
  • 76
  • 147
  • Was just about to write this. +1. – rayryeng Mar 10 '15 at 15:59
  • Nice. I was having trouble constructing a regex for this. – eigenchris Mar 10 '15 at 16:01
  • 1
    @eigenchris So was I :-) Credit for all my answers about `regexp` should go to [this page](http://www.regular-expressions.info/)! BTW I borrowed your examples – Luis Mendo Mar 10 '15 at 16:03
  • 1
    and why is the result for `str = '12341235123412351234';` **not** `'1234`? Regarding one of my comments it should. I ***really*** don't get that question. And I tried really hard. – Robert Seifert Mar 10 '15 at 16:09
  • @thewaywewalk Because `1234` repeated doesn't generate `str`. That `str` doesn't have a repeating pattern, other than the whole string itself – Luis Mendo Mar 10 '15 at 16:19
  • @RickT The edit you made to the question states that you are looking for the `longest pattern possible`? You have accepted this answer even though it gives you the shortest. Your edit isn't clear. What is it you are actually after? Longest or shortest? – kkuilla Mar 10 '15 at 16:21
  • 1
    I think that while the OP's problem statement is ambiguous, this is probably the answer they are looking for. It meets the specification of the equivalent Maple command [`PrimativeRoot()`](http://www.maplesoft.com/support/help/maple/view.aspx?path=StringTools%2FPrimitiveRoot). – eigenchris Mar 10 '15 at 16:23
  • @eigenchris The Maple link really clarifies the issue. The OP should edit the answer with this – Luis Mendo Mar 10 '15 at 16:25
  • So the OP **is** looking for the minimum length. Could someone with enough reps edit the question to reflect that. And please, format it properly. – kkuilla Mar 10 '15 at 16:30
  • @LuisMendo I finally agree that your answer is an equivalent of the Maple function. But without looking that up, it wasn't clear to me, that the pattern must be repeated directly. – Robert Seifert Mar 10 '15 at 16:31
  • @thewaywewalk Yes, the original question left a lot to imagination – Luis Mendo Mar 10 '15 at 16:33
  • 1
    @LuisMendo we really should introduce the Mentalist award ;) – Robert Seifert Mar 10 '15 at 16:33
  • @kkuilla I've edited to reflect that. As for the format, well, I wouldn't know how to do it without rewriting it too much... – Luis Mendo Mar 10 '15 at 16:34
  • @LuisMendo Thanks. Get rid of the bold stuff. I really have to get to 2k :-) – kkuilla Mar 10 '15 at 16:35
  • @thewaywewalk Hahaha. Pity that we haven't (yet) found a way to automatically assign a question the "mentalism needed" trait – Luis Mendo Mar 10 '15 at 16:35
  • 1
    @kkuilla Done. You are right, it looks better now. I've only used cursive for the two key sentences – Luis Mendo Mar 10 '15 at 16:39
  • 1
    @LuisMendo Title? `Finding the shortest repetitive pattern in a string`? – kkuilla Mar 10 '15 at 16:42
  • @LuisMendo Brilliant. That was a lot of edits... :-) – kkuilla Mar 10 '15 at 16:44
  • 1
    @kkuilla Perfect title. You'll make a very good editor soon :-) – Luis Mendo Mar 10 '15 at 16:48
  • Umm, this is a *little* faster than my answer (only by a factor of > x30), but is this really really *MATLAB*? ;-) – Buck Thorn Mar 10 '15 at 17:28
  • 2
    The OP strikes again into our hearts and minds to confuse us.... And out from the ashes of all of this confusion is born a mentalist whose sole purpose is to undo the confusion brought into this world. Several others are begotten from this very event thus creating... The league of mentalists. Go mentalists go! – rayryeng Mar 10 '15 at 17:57
  • 1
    On another note, I now know what positive and negative lookahead/lookbehind are in regex's. Thanks @LuisMendo! – eigenchris Mar 10 '15 at 18:23
  • @TryHard Hahaha, regular expressions are a world of their own! – Luis Mendo Mar 10 '15 at 20:37
  • @rayryeng We desperately need that badge created! – Luis Mendo Mar 10 '15 at 20:38
  • @eigenchris Yes, that's a very handy feature of regexps; as are are (numbered) backreferences – Luis Mendo Mar 10 '15 at 20:40
3

I'm not sure if this can be accomplished with regular expressions. Here is a script that will do what you need in the case of a repeated word called pattern.

It loops through the characters of a string called str, trying to match against another string called pattern. If matching fails, the pattern string is extended as needed.

EDIT: I made the code more compact.

str = 'lullabylullabylullaby';

pattern = str(1);
matchingState = false;
sPtr = 1;
pPtr = 1;

while sPtr <= length(str)
     if str(sPtr) == pattern(pPtr) %// if match succeeds, keep looping through pattern string
            matchingState = true;
            pPtr = pPtr + 1;
            pPtr = mod(pPtr-1,length(pattern)) + 1;
     else                          %// if match fails, extend pattern string and start again
            if matchingState
                sPtr = sPtr - 1;   %// don't change str index when transitioning out of matching state
            end  
            matchingState = false;
            pattern = str(1:sPtr);
            pPtr = 1;
     end

     sPtr = sPtr + 1;

end

display(pattern);

The output is:

pattern =

lullaby

Note:

This doesn't allow arbitrary delimiters between occurrences of the pattern string. For example, if str = 'lullaby1lullaby2lullaby1lullaby2';, then

pattern =

lullaby1lullaby2

This also allows the pattern to end mid-way through a cycle without changing the result. For example, str = 'lullaby1lullaby2lullaby1'; would still result in

pattern =

lullaby1lullaby2

To fix this you could add the lines

if pPtr ~= length(pattern)
    pattern = str;
end
eigenchris
  • 5,791
  • 2
  • 21
  • 30
  • What programming language are you using? – Jan Oct 25 '16 at 21:15
  • @Jan This is MATLAB, a language primarily used by engineers and scientists for numerical computations. The accepted solution is much better an uses regular expressions, which most languages support. – eigenchris Oct 25 '16 at 21:24
  • Thanks. Hard to find good documentation about this language. I am trying to convert it to C#. What does the lines "pattern = str(1);" and "pattern = str(1:sPtr);" and "if str(sPtr) == pattern(pPtr)" exactly do? I converted it to resp. "string pattern = s.Substring(0,1);", "pattern = s.Substring(0, sPtr);" and "if (s.Substring(sPtr) == pattern.Substring(pPtr))". I also noticed that Matlab strings are 1-based instead of zero-based in C#. – Jan Oct 25 '16 at 21:48
  • @Jan It looks like you've converted the code more or less correctly. If you are struggling with this, I would recommend you open a new question. Again, for searching for patterns in text, I highly recommend you learn regular expressions, (C# supports them) and using one of the regex strings in the accepted answer. – eigenchris Oct 25 '16 at 22:01
  • The accepted answer using regular expression doesn't work in all cases. My first solution was using regular expressions but it failed on very big strings. But I have your code running now. I am now trying to iterate upon it so it supports input strings like "1666666" so that it will return 6. – Jan Oct 25 '16 at 22:59
2

Another approach is as follows:

  1. determine length of string, and find all possible factors of the string length value
  2. for each possible factor length, reshape the string and check for a repeated substring

To find all possible factors, see this solution on SO. The next step can be performed in many ways, but I implement it in a simple loop, starting with the smallest factor length.

function repeat = repeats_in_string(str);
ns = numel(str);
nf = find(rem(ns, 1:ns) == 0);
for ii=1:numel(nf)
    repeat = str(1:nf(ii));
    if all(ismember(reshape(str,nf(ii),[])',repeat)); 
        break;
    end
end 
Community
  • 1
  • 1
Buck Thorn
  • 5,024
  • 2
  • 17
  • 27
0

This problem is a great Rorschach test for your approach to problem solving. I'll add a signal engineering solution, which should be simple since the signal is expected to be perfectly repetitive, assuming this holds: find the shortest pattern that upon repetition generates the whole string.

In the following str fed to the function is actually a column vector of floats, not a string, the original string having been converted with str2num(str2mat(str)'):

function res=findshortestrepel(str);
[~,ii] = max(fft(str-mean(str)));
res = str(1:round(numel(str)/(ii-1)));

I performed a small test, comparing this to the regexp solution and found it to be faster overall (blue squares), although somewhat inconsistently, and only if you don't consider the time required to convert the string into a vector of floats (green squares). However I did not pursue this further (not breaking records with this):

enter image description here

Times in sec.

Buck Thorn
  • 5,024
  • 2
  • 17
  • 27