5

I want to accept only strings that do not have any substring repeated three times consecutively in them. The substring is not known in advance. For example, "a4a4a4123" contains "a4"; "abcdwwwabcd" - "w"; "abcde" - valid, no triple repeats.

I tried to implement it myself, but this only works for substrings with one letter:

public bool IsValid(string password)
{
    var validate = true;
    char lastLetter = ' ';
    var count = 1;

    for (int pos = 0; pos < password.Length; pos++)
    {
        if (password[pos] == lastLetter)
        {
            count++;

            if (count > 2)
            {
                validate = false;
                break;
            }
        }
        else
        {
            lastLetter = password[pos];
            count = 1;
        }
    }

    return validate;
}
Patashu
  • 21,443
  • 3
  • 45
  • 53
Ilya Shpakovsky
  • 281
  • 1
  • 3
  • 16
  • 4
    Did you try anything? – Soner Gönül May 31 '13 at 13:33
  • 1
    Do you want to write this: bool containsSubstringThreeTimes(string, substring) or this: string[] substringsContainedThreeOrMoreTimes(string)? Because they're very different algorithms. – Patashu May 31 '13 at 13:35
  • Soner Gönül, yes, i've updated my question. – Ilya Shpakovsky May 31 '13 at 13:37
  • 1
    Patashu, i need second substringsContainedThreeOrMoreTimes(string) – Ilya Shpakovsky May 31 '13 at 13:40
  • There is a question on SO for [finding substrings in a string](http://stackoverflow.com/questions/10055035/to-find-all-the-repeating-substring-in-a-given-string). The answer is to build [suffix tree](http://en.wikipedia.org/wiki/Suffix_tree). – Spoike May 31 '13 at 13:42
  • 1
    This sounds pretty hard. I would try to go greedy : generate every sub string of length less or equal than a third of the big one, and match those within the big one. Depending on the performances wanted, this could or not be practical : for a 15 letters length, this add up to 4943 substring, which you then all have to search for. – C4stor May 31 '13 at 13:43
  • Not actually that hard, because Regex. – It'sNotALie. May 31 '13 at 13:45
  • By the way... should "Mississippi" be a non-valid password? – Spoike May 31 '13 at 13:47

3 Answers3

10

Try this:

bool result = Regex.IsMatch(input, @".*(.+).*\1.*\1.*");

Basically, it checks if a pattern of one or more characters appears 3 or more times in the same string.

Full explanation:

First, it matches 0 or more characters at the beginning of the string. Then it captures a group of one or more. Then it matches 0 or more, and then the group again. Then 0 or more again, and then the capture again. Then 0 or more again.

If you require the string to be consecutive, try this:

bool result = Regex.IsMatch(input, @".*(.+)\1\1.*");

Also, some performance test results:

Non-consecutive: 312ms
Consecutive: 246ms

Tests were done with this program:

using System;
using System.Diagnostics;
using System.Text.RegularExpressions;

class Program
{
    public static void Main(string[] args)
    {
        string input = "brbrbr";
        Regex one = new Regex(@".*(.+).*\1.*\1.*");
        for (int i = 0; i < 5; i++)
        {
            bool x = one.IsMatch(input); //warm regex up
        }
        Stopwatch sw = Stopwatch.StartNew();
        for (int i = 0; i < 100000; i++)
        {
            bool x = one.IsMatch(input);
        }
        sw.Stop();
        Console.WriteLine("Non-consecutive: {0}ms", sw.ElapsedMilliseconds);
        Regex two = new Regex(@".*(.+)\1\1.*");
        for (int i = 0; i < 5; i++)
        {
            bool x = two.IsMatch(input); //warm regex up
        }
        Stopwatch sw2 = Stopwatch.StartNew();
        for (int i = 0; i < 100000; i++)
        {
            bool x = two.IsMatch(input);
        }
        sw.Stop();
        Console.WriteLine("Consecutive: {0}ms", sw2.ElapsedMilliseconds);
        Console.ReadKey(true);
    }
}
It'sNotALie.
  • 22,289
  • 12
  • 68
  • 103
  • Is the regex simpler/faster if you require the triple match to be consecutive? – Patashu May 31 '13 at 13:47
  • @Patashu Not sure if faster, going to do some performance tests. Be back in a sec :). But definitely simpler, mainly cos you get rid of a lot of `.*`'s. – It'sNotALie. May 31 '13 at 13:47
  • 1
    There is a consecutive requirement for the string. I think this would work '@".*(.+)\1\1.*"'. I'm unsure, though. – Stickman May 31 '13 at 13:51
  • @MatthewWatson My first one would do that, as it has 3 a's. The new one doesn't match it. :) – It'sNotALie. May 31 '13 at 14:06
  • 2
    This is an interesting example of how so-called "regular" expressions are nothing of the sort. A regular language is by definition one that can be matched by a program that has a *fixed* amount of storage available, but this pattern matcher can "remember" arbitrarily long substrings. – Eric Lippert May 31 '13 at 15:21
  • Is there a Java equivalent of this? – Marvin Jul 24 '19 at 02:32
  • @Marvin, if I read the Java docs correctly this *should* work with Java's `Pattern` class, although I wouldn't recommend using it. Since I wrote this answer, I've learnt about [Catastrophic Backtracking](https://www.regular-expressions.info/catastrophic.html), which I'm fairly sure this is an example of. – It'sNotALie. Jul 25 '19 at 00:24
0

Regex is how I would have attacked this:

static void Main(string[] args)
        {
            string text = "C# is the best language there is in the world.";
            string search = "the";
            Match match = Regex.Match(text, search);
            Console.WriteLine("there was {0} matches for '{1}'", match.Groups.Count, match.Value);
            Console.ReadLine();
        }

How to find multiple occurrences with regex groups?

Community
  • 1
  • 1
Rikon
  • 2,688
  • 3
  • 22
  • 32
-1
Regex.Matches( "a4a4a4123",  "a4" ).Count >= 3
Niels V
  • 1,005
  • 8
  • 11
  • 1
    The OP has updated his question. This is no longer a valid answer. – Patashu May 31 '13 at 13:39
  • I don't think the substring is given as input. The problem is to identify all substrings that appear 3 or more times in a string. – mbeckish May 31 '13 at 13:40