22

I'm getting System.OutOfMemoryException when trying to generate 6 letter permutations. 5 letter permutations still work.

Here is the code I'm using to generate ALL permutations:

private static List<string> getPermutations(int n,string source)
        {
            IEnumerable<string> q = source.Select(x => x.ToString());
            for (int i = 0; i < n - 1; i++)
            {
                q = q.SelectMany(x => source, (x, y) => x + y);
            }
            return q.ToList(); // THIS IS WHERE THE ERROR HAPPENS
        }

after which I am using this piece of code to filter them based on regex:

private static List<string> filterListByRegex(List<string> list, string regex)
        {
            List<string> newList = list.ToList();
            for (int i = newList.Count - 1; i >= 0; i--)
            {
                Match match = Regex.Match(""+newList[i], regex, RegexOptions.IgnoreCase);
                if (!match.Success)
                {
                    newList.RemoveAt(i);
                }
            }
            return newList;
        }

so since I don't really need ALL those permutations, is there a way to regex filter while generating permutations, or should I use a more efficient piece of code to generate permutations?

Here is a picture to better demonstrate what I'm trying to achieve: enter image description here

The vertical alphabet string is the one I'm telling the code to use.

Community
  • 1
  • 1
Shard
  • 571
  • 1
  • 5
  • 17
  • Wait. You're trying to make a permutation of all letters in the alphabet? – Zizouz212 Feb 13 '16 at 21:26
  • Assuming each permutation has a length of 6... This is 165,765,600 different objects. (Type `26P6` in your calculator). For permutations of length 5, this is 7,893,600 (Type `26P5` in your calculator). That's *way* less. – Zizouz212 Feb 13 '16 at 21:28
  • Yes that is exactly what I am doing currently. I know 6 letter permutation of all letters in the alphabet gets huge, so I'm trying to figure out what is the smartest way to go about it. – Shard Feb 13 '16 at 21:31
  • 1
    well if you change the code to use IEnumerable instead of a list it would not need all the permutations in memory at the same time – juharr Feb 13 '16 at 21:55
  • You might want to check out Eric Lippert's [Producing permutations](http://ericlippert.com/2013/04/15/producing-permutations-part-one/) blog posts. – juharr Feb 13 '16 at 22:07
  • @Zizouz212 Wondering why you opened a bounty. There is an accepted answer and also some other good answers, so I wouldn't say the question didn't receive enough attention. You are not satisfied by the answers, or what really do you expect? – Ivan Stoev Feb 17 '16 at 22:01
  • @IvanStoev I probably chose the wrong reason. That and I was in a good mood to realize. :) – Zizouz212 Feb 18 '16 at 23:08

4 Answers4

16

First, I would like to mention that what we are discussing here is not real permutations, but so called n-tuples or permutations with repetition - Wikipedia.

Second, regarding the System.OutOfMemoryException when generating permutations, I think we all agree that the result should not be stored in a list, but provided as enumerable which will allow applying filtering and consuming (eventually storing) only the ones in interest.

In that regard the LINQ solution provided by @juharr is performing very well. So my goals are to minimize the intermediary memory allocations, including string concatenations and also to end up with a more general and faster solution.

In order to do that, I need to take some hard design decision. The signature of the general function I'm talking about will look like this

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)

and the question is what should be the array yielded. If we follow the recomendations, they should be a separate array instances. However, remember I want to minimize allocations, I've decided to break that rules and yield one and the same array instance, moving the responsibility of not modifying it and cloning it if necessary to the caller. For instance, this allows the caller to perform no cost filtering. Or implement the OP function on top on it like this

public static IEnumerable<string> RepeatingPermutations(this string set, int N)
{
    return set.ToCharArray().RepeatingPermutations(N).Select(p => new string(p));
}

A few words about the algorithm. Instead of looking at the problem recursively as some other answerers, I want to effectively implement the equivalent of something like this

from e1 in set
from e2 in set
...
from eN in set
select new [] { e1, e2, .., eN }

Interestingly, I recently answered a combinations related question and realized that the algorithms are pretty much the same.

With all that being said, here is the function:

public static IEnumerable<T[]> RepeatingPermutations<T>(this T[] set, int N)
{
    var result = new T[N];
    var indices = new int[N];
    for (int pos = 0, index = 0; ;)
    {
        for (; pos < N; pos++, index = 0)
        {
            indices[pos] = index;
            result[pos] = set[index];
        }
        yield return result;
        do
        {
            if (pos == 0) yield break;
            index = indices[--pos] + 1;
        }
        while (index >= set.Length);
    }
}

I've did some tests by simply calling the different functions with N=2,3,..6 and simply iterating and counting. Here are the results on my machine:

A : N=2 Count=         676 Time=00:00:00.0000467 Memory=     29K
B1: N=2 Count=         676 Time=00:00:00.0000263 Memory=     16K
B2: N=2 Count=         676 Time=00:00:00.0000189 Memory=      8K

A : N=3 Count=      17,576 Time=00:00:00.0010107 Memory=    657K
B1: N=3 Count=      17,576 Time=00:00:00.0003673 Memory=    344K
B2: N=3 Count=      17,576 Time=00:00:00.0001415 Memory=      8K

A : N=4 Count=     456,976 Time=00:00:00.0184445 Memory=  2,472K
B1: N=4 Count=     456,976 Time=00:00:00.0096189 Memory=  2,520K
B2: N=4 Count=     456,976 Time=00:00:00.0033624 Memory=      8K

A : N=5 Count=  11,881,376 Time=00:00:00.4281349 Memory=    397K
B1: N=5 Count=  11,881,376 Time=00:00:00.2482835 Memory=  4,042K
B2: N=5 Count=  11,881,376 Time=00:00:00.0887759 Memory=      8K

A : N=6 Count= 308,915,776 Time=00:00:11.2697326 Memory=  1,688K
B1: N=6 Count= 308,915,776 Time=00:00:06.5638404 Memory=  1,024K
B2: N=6 Count= 308,915,776 Time=00:00:02.2674431 Memory=      8K

where

A - LINQ function from @juharr
B1 - my function with string
B2 - my function with char[]

As we can see, memory wise both string functions are comparable. Performance wise the LINQ function is only ~2 times slower, which is pretty good result.

As expected in such scenario, the non allocating function significantly outperforms them both.

UPDATE: As requested in the comments, here is the sample usage of the above functions (note that they are extension methods and must be placed in a static class of your choice):

var charSet = Enumerable.Range('A', 'Z' - 'A' + 1).Select(c => (char)c).ToArray();
var charPermutations = charSet.RepeatingPermutations(3);
var stringSet = new string(charset);
var stringPermutations = stringSet.RepeatingPermutations(3);

However, remember the design choice I've made, so if you expand the charPermutations inside the debugger, you'll see one and the same values (the last permutation). Consuming the whole result of the above call for char[] should be like this

var charPermutationList = charSet.RepeatingPermutations(3)
    .Select(p => (char[])p.Clone()).ToList();

Actually a good addition to the two methods presented would be the following extension method:

public static IEnumerable<T[]> Clone<T>(this IEnumerable<T[]> source)
{
    return source.Select(item => (T[])item.Clone());
}

so the consuming call would be simple

var charPermutationList = charSet.RepeatingPermutations(3).Clone().ToList();
Community
  • 1
  • 1
Ivan Stoev
  • 195,425
  • 15
  • 312
  • 343
  • 3
    Nice answer. Especially on the algorithm and performance analysis part. ;) +1 – Ian Feb 20 '16 at 20:12
  • Can you also add an example of how you would call your char[] function? – Shard Feb 20 '16 at 23:05
  • 1
    @Ian Glad to see you around:) And thank you! Data structures, algorithms and performance have always been my favorite areas starting from old good times :) – Ivan Stoev Feb 22 '16 at 15:10
  • @IvanStoev haha... same here.. :) And congratz for obtaining the bounty! ;) – Ian Feb 22 '16 at 15:11
  • @Ian What? You don't know that! :P – Zizouz212 Feb 23 '16 at 02:36
  • @Zizouz212 "that"? About the algorithm you mean? – Ian Feb 23 '16 at 02:45
  • @Ian Nah, I was joking a little about the bounty - he hasn't got it yet, but he *will* soon :) – Zizouz212 Feb 23 '16 at 02:46
  • @Zizouz212 ah, I see... Yeah, since his answer is marked as *accepted* I assume he shall be the one who gets the bounty - unless in the last minute the OP gives it to someone like - to your answer for example. ;) – Ian Feb 23 '16 at 02:47
  • @Ian You do realize that I started the bounty, right? – Zizouz212 Feb 23 '16 at 02:48
  • @Zizouz212 oopps, sorry. :s you are right! The OP is normally the one that gives bounty, but this case is not. It is you who started the bounty. And yes, I missed that.. :s my bad... – Ian Feb 23 '16 at 02:51
  • 2
    @Ian lol. Yeah I started it cause I was interested in learning more. I'm in grade 11 after all :) – Zizouz212 Feb 23 '16 at 02:52
  • @Zizouz212 I see... haha :D this is indeed a rather *unique* way to learn more, I think. But who knows? – Ian Feb 23 '16 at 02:53
  • @Ian lol, Zizouz212 is right, here we say something like this in such cases *"Don't put the pan while the fish is still at sea"* :) – Ivan Stoev Feb 23 '16 at 13:01
5

The best thing to do here is to use the lazy initialization to avoid having all the permutations in memory at the same time.

private static IEnumerable<string> getPermutations(int n,string source)
{
    IEnumerable<string> q = source.Select(x => x.ToString());
    for (int i = 0; i < n - 1; i++)
    {
        q = q.SelectMany(x => source, (x, y) => x + y);
    }

    return q; 
}

private static List<string> filterListByRegex(IEnumerable<string> list, string regex)
{
    List<string> newList = new List();
    foreach(var item in list)
    {
        Match match = Regex.Match(item, regex, RegexOptions.IgnoreCase);
        if (match.Success)
        {
            newList.Add(item);
        }
    }

    return newList;
}

This may not be the most efficient way to do it, but at least it should get you past the memory issues.

juharr
  • 31,741
  • 4
  • 58
  • 93
3

Here's a simple solution that is both computationally and memory efficient.

  • Instead of generating the entire list of permutations and then find matches, using an iterator lets us process potential permutation matches as they get generated.
  • With a little bit of backtracking, only permutations that have a chance to match your regex get generated.

All you need is an extra regular expression which accepts partial candidates. It should accept strings that could become a match if characters get added. (It would have be nice to have something like hitEnd() in Java which does exactly this. This would eliminate the need for that regular expression. Unfortunately, I do not think there is an equivalent in .Net)

In my example, I want to find permutations of the string "123456789" that match regex "32145.67". I use the (suboptimal) regular expression "^3$|^32$|^321" to discard permutations that do not start with 321. (Of course, here it would have been possible to generate the permutations for "456789" and prepend "123" to the results, but this is just to illustrate the concept.)

The efficiency of this solution will rely mostly on how many invalid matches you can discard early in the generation of permutations.

Short explanation of how the permutation generation works. Let's try to generate all the permutations of the string "abc". It can easily be seen that:

permutations("abc") = {"a" + permutations("bc"),
                       "b" + permutations("ac"),
                       "c" + permutations("ab")}

In other words, we take each character of the input string, append it to an accumulator and compute all the permutations for the input string with the character removed. Once we reach a leaf - a permutation of the input string -, the accumulator will have the size of the input string.

This can be written succinctly in recursive pseudo-code as:

permutation(input, acc)
  if input empty
     return acc

  foreach(character in input)
      left <- remove char from input
      permutation(left, acc+char)

Now this is not the most efficient way to generate permutations. (see Heap's algorithm) But at least it allows us not to explore the entire tree structure and discard permutations just by looking at their prefix.

Since "yield return" does not work so well in recursive functions, I have simply rewritten the solution in a iterative manner (Note: space complexity is worse than with the above recursive DFS).

public IEnumerable<string> getPermutation(string input, string regexp)
{
        Stack<string> left = new Stack<string>();
        Stack<string> acc = new Stack<string>();

        left.Push(input);
        acc.Push("");

        // generate all permutations that match regexp
        while (left.Count > 0)
        {
            string c = left.Pop();
            string r = acc.Pop();

            if(r.Length==input.Length)
            {
                yield return r;
            }
            else
            {
                for(int i=0;i<c.Length;i++)
                {
                    string p = r + c[i];
                    if (Regex.IsMatch(p,regexp)) // continue if we have a potential match
                    {
                        left.Push(c.Substring(0, i) + c.Substring(i + 1));
                        acc.Push(p);
                    }
                }
            }

        }            
}



foreach(var a in getPermutation("123456789", "^3$|^32$|^321"))
{
    if(Regex.IsMatch(a, "32145.67"))
    {
         // found match
    }

}
Simon
  • 151
  • 3
  • Does this also work with letters? Also could you please explain your code? Especially the getPermutation method. – Shard Feb 14 '16 at 00:17
  • Yes, this will generate permutations for any input string. I simply used a string of numbers here as an example. I am updating the code with more explanations. – Simon Feb 14 '16 at 00:58
  • Thank you! Because I modified the code to work with my example where I use alphabet, but I can't seem to make it work. My string input is an alphabet and then I tell my code to generate for example 3 letter word and it creates for example aaa,aab,aac...etc. So in each step whole alphabet is used. – Shard Feb 14 '16 at 01:00
  • I added a picture to my post to demonstrate what I'm after better. – Shard Feb 14 '16 at 01:31
  • With your edited code, wouldn't it mean that for example `aaab` is not possible? – Shard Feb 14 '16 at 02:12
  • What do you mean by it is not possible? It depends on the string for which you want to compute the permutations. If it is "baaa", yes it will be generated. – Simon Feb 14 '16 at 02:23
  • So in order to get 6 letter combos like in my image demonstration I would have to pass 6 times the alphabet as an input string, right? – Shard Feb 14 '16 at 02:29
  • A permutation is just reordering of a given string. Are you instead trying to generate all the strings of a given size? – Simon Feb 14 '16 at 02:34
  • Yes, I'm sorry if I wasn't clear. I'd appreciate if you could help me out with this, been stuck on it for a while now. – Shard Feb 14 '16 at 02:37
  • Oh ok, in this case checkout https://github.com/moodmosaic/Fare . This is a wrapper library in c# around xeger. You can use it to generate strings that match a regular expression. This will be more efficient than generating all the strings of size n and trying to find a match. – Simon Feb 14 '16 at 02:59
2

You're running out of memory from storing all of those permutations at one point.

Assuming a length of 5 chars, there are 7,893,600 different permutations.
Assuming a length of 6 chars, there are 165,765,600 different permutations.

Considering that each character in a string is worth 2 bytes of memory, you would need 1,989,187,200 bytes (Just around 2 Gigabytes) to store all permutations. That's not exactly desirable.

So how do we fix this?

I've never coded in c#, but here's a practical design decision: perform individual processing when the permutation is created itself. That way, you only need to store the permutations you need. Here's some pseudo-code:

List<string> storedPermutations;
string s = createPermutation();
bool shouldAdd = shouldAddPermutation(s);
if (bool) 
{
    storedPermutations.add(s);
}

That's arguably not the best (nor is it probably pseudo-code), but the logic here is to decide whether to add the permutation to the list the moment it is created, instead of adding everything to a list, and then trying to process that entire list. If you still run out of memory, then there's still a lot of permutations.

Zizouz212
  • 4,908
  • 5
  • 42
  • 66
  • Yeah, I could filter them by regex at that point, but using `q.SelectMany(x => source, (x, y) => x + y);` is the only way I know how to generate permutations. Could you help me with figuring out how to achieve that? – Shard Feb 13 '16 at 21:41
  • @Shard I'm only in Grade 11, and I'm definitely not some super cool math person, so I'm not exactly sure how I would be able to help with that. Sorry about that :( – Zizouz212 Feb 13 '16 at 22:04