0

I am doing the following programming exercise: String subpattern recognition I. The statement is:

In this kata you need to build a function to return either true/True or false/False if a string can be seen as the repetition of a simpler/shorter subpattern or not.

For example:

hasSubpattern("a") === false; //no repeated pattern
hasSubpattern("aaaa") === true; //created repeating "a"
hasSubpattern("abcd") === false; //no repeated pattern
hasSubpattern("abababab") === true; //created repeating "ab"
hasSubpattern("ababababa") === false; //cannot be entirely reproduced
repeating a pattern

Strings will never be empty and can be composed of any character (just consider upper- and lowercase letters as different entities) and can be pretty long (keep an eye on performances!).

I have written the following answer:

public class Kata {
  public static boolean hasSubpattern(String str) {
    if(str.length() <= 1) return false;
    String pattern = "";
    for(int i = 0; i <= str.length()/2; i++){
      String letter = String.valueOf(str.charAt(i));
      pattern += letter;
      String repeated = pattern.repeat(str.length()/pattern.length());
      if(repeated.equals(str)){
        return true;
      }
    }
    return false;
  }
}

Being 4 traces the followings:

str: a

str: aaaa
letter: a
pattern: a
repeated: aaaa

str: abcd
letter: a
pattern: a
repeated: aaaa
letter: b
pattern: ab
repeated: abab
letter: c
pattern: abc
repeated: abc

str: abababab
letter: a
pattern: a
repeated: aaaaaaaa
letter: b
pattern: ab
repeated: abababab

So then, the code passes the normal tests (with small length strings). However it does not pass with bigger strings, because it times out (the execution time is more than 1600ms).

I tried other approach, instead of repeating each pattern, we are now replacing it all from the original string.

public class Kata {
  public static boolean hasSubpattern(String str) {
    if(str.length() <= 1) return false;
    String pattern = "";
    for(int i = 0; i <= str.length()/2; i++){
      String letter = String.valueOf(str.charAt(i));
      pattern += letter;
      String repeated = str.replaceAll(pattern,"");
      if(repeated.equals("")){
        return true;
      }
    }
    return false;
  }
}

However it stills timing out with bigger strings.

After that I tried again using this code:

public class Kata {
  public static boolean hasSubpattern(String str) {
    if(str.length() <= 1) return false;
    String pattern = "";
    for(int i = 0; i <= str.length()/2; i++){
      String letter = String.valueOf(str.charAt(i));
      pattern += letter;   
      String regex = "("+pattern+"){"+str.length()/pattern.length()+"}";
      if(str.matches(regex)){
        return true;
      }
    }
    return false;
  }
}

However it keeps timing out. I suppose we need to remove the foor loop.

Besides I have read:

In addition while reading similar topics on Stack Overflow, I found one which describes this kata's Python solution. Here is the link:

So then we could just use the following code:

public class Kata {
  public static boolean hasSubpattern(String str) {
    return str.matches("^(\\w+)\\1+$");
  }
}

Which made me learnt that we can mention to a previous matched group inside the regex itselft, by writting an escaped number related to what group we are looking for. In this case \1 refers to (\w+)

However I would like to discuss how could we achieve one iterative solution which works for bigger inputs.

How could we detect strings which are made of subpattern‽

Yone
  • 2,064
  • 5
  • 25
  • 56
  • If your problem is performance, your first step should be to benchmark it. You are trying to optimize based on assumptions. Be empirical. Write a proper test, identify the bottlenecks and work out how you can overcome them. – Michael Jan 06 '20 at 12:12
  • FWIW, *my* assumption would be that regular expressions are too heavyweight for this problem and that you will be able to achieve much better performance with more primitive constructs (e.g. `char[]`). But don't take my word for it. Profile it and see. – Michael Jan 06 '20 at 12:13

0 Answers0