4

Possible Duplicate:
How to find all substrings of a string in PHP
Find all subsets of a list

How can I compute all the possible substrings of a string? For example given a string ABCDE. All its possible substrings will be

A, B, C, D, E, AB, BC, CD, DE, ABC, BCD, CDE, ABCD, BCDE, ABCDE

Thanks! A pseudocode will be highly appreciated. :D

Community
  • 1
  • 1
neilmarion
  • 2,372
  • 7
  • 21
  • 36
  • 1
    This type of question has been asked many times before: http://stackoverflow.com/questions/728972 , http://stackoverflow.com/questions/1592039 , http://stackoverflow.com/questions/5023081/ , http://stackoverflow.com/questions/6780935/ , etc. etc. Just search on "powerset" or "subset". – bobbymcr Nov 29 '11 at 03:17
  • 2
    I disagree with both Ken and bobbymcr. While it is true that the linked question is "how to find substrings of a string?" it seems to involve some heavy PHP with minimal explanation, as do the answers. All of the links bobbymcr linked are completely separate problems, though related (substrings are not subsets). Upon cursory inspection, I cannot find a similar language-agnostic question with clear answers given in pseudocode. – ninjagecko Nov 29 '11 at 03:28
  • 1
    @ninjagecko: On second reading, I see that you're right -- the OP seems to want something more specialized than pure subsets, but something like substrings with fully connected elements (e.g. from ABC -> AB and BC, but not AC). – bobbymcr Nov 29 '11 at 03:42

2 Answers2

5

Just use two for-loops:

generate substrings(string):
    for start in [0,1,...,string.length-1]:
        for end in [start,...,string.length-1]:
            yield string[start...end]

You can also do it this way with two for-loops:

generate substrings(string):
    for substringLength in [1,2,...,string.length]:
        for start in range [0,1,...,string.length-substringLength]:
            yield string[start...(start+substringLength-1)]
    yield ""

You probably want to include the empty string "" in the sequence you return as well, as it is a substring of all strings.

You also need to consider if it is valid to yield a duplicate string multiple times (e.g. do you return "ABA" twice as a substring of "ABABA"?). If the answer is no, merely make a hashtable called alreadyYielded, and whenever you yield, abort if you've already yielded the string, otherwise add the value to the hashtable in case you see it again. For example:

seen = new HashTable()
...
        substring = string[...]
        if substring not in seen:
            seen.add(substring)
            yield substring
...
ninjagecko
  • 88,546
  • 24
  • 137
  • 145
2

Here's a 2-cent answer:

for (indexOfFirstLetterOfString = 0; indexOfFirstLetterOfString < string.length; indexOfFirstLetterOfString++) {

   for (indexOfLastLetterOfString = indexOfFirstLetterOfString + 1; indexOfLastLetterOfString < string.length; indexOfLastLetterOfString++) {

        addToArrayOfStrings ( string.substring (indexOfFirstLetterOfString, indexOfLastLetterOfString - indexOfFirstLetterOfString))
        incrementCounter();

    }
}

To get the number of combinations, simply add a counter to the inner loop.

For instance, in perl, this might look like:

$a = "ABCDE";

$numberOfSubstrings = 0;

for ($indexOfFirstLetter = 0; $indexOfFirstLetter <= length($a); $indexOfFirstLetter++) {

    for ($indexOfLastLetter = $indexOfFirstLetter + 1; $indexOfLastLetter <= length($a); $indexOfLastLetter++)  {
        print substr($a, $indexOfFirstLetter, $indexOfLastLetter - $indexOfFirstLetter) . "\n";

        $numberOfSubStrings++;
    }
}

print "Number of substrings: " . $numberOfSubStrings;
Mike Fahy
  • 5,487
  • 4
  • 24
  • 28