0

I am trying to generate all permutations of the sequence 'AABBBCCCCCDDDDDEEEEE' without repetitions. Is there an efficient way to do this in python?

Code I use:


for s in permutations('AABBBCCCCCDDDDDEEEEE'):
    df.loc[len(df)] = s
    if len(df) % 1000 == 0:
        df = df.drop_duplicates(ignore_index=True)

But this is very slow unfortunately.

  • 1
    What are you trying to achieve? Do you really need to fit all the permutations into memory at the same time for the use case..? The number of elements is `N!`, so in this case 20!, which is 2432902008176640000. That is a large number. – Niko Föhr Jun 01 '23 at 09:21

5 Answers5

1
from more_itertools import distinct_permutations
sequence = 'AABBCCDDEE'
def updated_code():
     unique_permutations = list(distinct_permutations(sequence))
     return unique_permutations

you can use the distinct_permutations method from more_itertools

0

To de-duplicate the permutations you could use a set:

for s in set(permutations('AABBBCCCCCDDDDDEEEEE')):
    df.loc[len(df)] = s
    if len(df) % 1000 == 0:
        df = df.drop_duplicates(ignore_index=True)

Iterating over permutations is very costly because of the combinatorics at play, I don't think you can speed that step up. What are you trying to do inside your loop? There may be a way of optimizing there.

Learning is a mess
  • 7,479
  • 7
  • 35
  • 71
0

My approach, which does not use language specific libraries and does not require a potentially huge additional memory (like the set or hash map from the other solutions):

  1. Replace every position with another symbol that is unique. Have them induce an order for each original letter (which in turn induces a strict partial order). Basically AABB -> A_1 A_2 B_1 B_2
  2. Use any algorithm to create permutations.
  3. For each created permutation, check whether the partial order is broken. For instance A_1 B_1 A_2 B_2 is fine, but A_2 B_1 A_1 B_2 is not, because A_2 must not come before A_1. If the order is broken, throw the item away.
  4. Retransform the symbols into the original ones.

[Obviously you wouldn't have a symbol "A_1", so instead go for any unicode symbol and have something like an array for each original letter that does the mapping and also induces an order.]

Now if you'd have two equivalent permutations in the end, given the underlying unique symbols, they must have one position in which two same letters were created from a different order of underlying unique symbols, which is a contradiction to the order not being broken -> can't happen.

Again, this solution is meant in case that the set solution is problematic for your memory, and you being able to directly do something with the permutation and then forget it before the next one is used.
If this is not an issue for you, use the set solution.

Aziuth
  • 3,652
  • 3
  • 18
  • 36
0

well first off all the line:

df = df.drop_duplicates(ignore_index=True)

does nothing and running the code below will prove that:

from itertools import permutations
import pandas as pd

df = pd.DataFrame(columns=['1', '2', '3'])
for s in permutations('AAA'):
    df.loc[len(df.index)] = s
    if len(df) % 1000 == 0:
        df = df.drop_duplicates(ignore_index=True)

print(df)

now describing your scenario you are trying to calculate permutations of 20 characters which is 20 * 20! bytes of data that is 44,254,229 TBs of data yes its gonna take a while centuries maybe and nor will fit on your ram/hard dist

this is the way you could get the answer you are looking for:

from more_itertools import distinct_permutations
import pandas as pd

df = pd.DataFrame(columns=['1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20'])
for s in distinct_permutations('AABBBCCCCCDDDDDEEEEE'):
    print(s)
    df.loc[len(df.index)] = s

print(df)

which will take so much time

-2

Generating all permutations of a sequence without repetitions can be a computationally expensive task, especially when the sequence is long. In your current code, you are using the permutations function from the itertools module, which generates all possible permutations. However, this can result in a large number of permutations, leading to slow performance.

One possible approach (so my approach that I suggest to you) to improve the efficiency is to use a backtracking algorithm to generate permutations without repetitions. Here's an example implementation in Python:

def generate_permutations(sequence):
    used = set()

    def backtrack(curr_perm, remaining):
        if not remaining:
            yield curr_perm
        else:
            for i, char in enumerate(remaining):
                if char not in used:
                    used.add(char)
                    yield from backtrack(curr_perm + char, remaining[:i] + remaining[i+1:])
                    used.remove(char)

    yield from backtrack('', sequence)

sequence = 'AABBBCCCCCDDDDDEEEEE'
df = pd.DataFrame(columns=['Permutation'])
for perm in generate_permutations(sequence):
    df.loc[len(df)] = perm
    if len(df) % 1000 == 0:
        df = df.drop_duplicates(ignore_index=True)

In this code, the generate_permutations function uses a recursive backtracking approach to generate all permutations. The function maintains a set of used characters to avoid repetitions. It yields each valid permutation, which can be collected in the DataFrame df as you were doing before.

By using yield and yield from, we can generate permutations on the fly and iterate over them efficiently without having to store all permutations in memory simultaneously. This is particularly useful when dealing with large sequences or when memory usage is a concern.

By using backtracking and avoiding unnecessary permutations, this implementation should be more efficient compared to generating all permutations upfront. However, note that generating all permutations without repetitions for a sequence of length n still has a time complexity of O(n!), which can become impractical for large values of n.

The backtracking approach helps avoid generating duplicate permutations by using a set to keep track of used characters and skipping already used characters during the recursion. However, it does not reduce the overall complexity of generating all permutations without repetitions.

If you are dealing with a sequence of significant length, finding all permutations without repetitions may not be feasible due to the exponential nature of the problem. In such cases, you may need to consider alternative approaches or reevaluate the problem to find a more efficient solution or approximate the result in some way.

Aleff
  • 61
  • 8
  • 3
    Please note that the use of GPT/AI tools continues to be banned here (including Bing Chat). This answer, along with multiple others you've posted, appear to have a high likelihood of being generated by AI (with possible edits that you've made). If you used an AI tool to assist in this answer, I would encourage you to delete it. For readers of this answer, please validate the answer carefully and comment with any errors. Users who post low-quality information from AI may be suspended, and the moderation team could use your help identifying issues in the post (if any). – NotTheDr01ds Jun 01 '23 at 12:15
  • @NotTheDr01ds I did not use any of the things you quoted, I responded as I saw fit to do. However, if my help can in any way harm users I will not continue. – Aleff Jun 01 '23 at 13:01