Generate unique combinations of length k from string

Question

Lets say we have the string aabc. If you take two chars from this string without replacement, your result will be in the set {aa, ab, ac, bc}. I'm trying to develop an algorithm to construct this set. It's important that there be no duplicates. The algorithm if all chars are unique is simple, but it gets more complicated if there are duplicates. I came up with this, but I'm curious if there are simpler solutions:

from collections import Counter

class Solution(object):
    def combinations(self, s, k):
        s = Counter(s)
        self.res = []
        self.combo_helper('', s, k)
        return self.res
    
    def combo_helper(self, used, remaining, k):
        if k == 0:
            self.res.append(used)
        else:
            new = remaining.copy()
            for n in remaining:
                if remaining[n]:
                    new[n] -= 1
                    self.combo_helper(
                        used+n, new, k-1
                    )
                    new = new.copy()
                    new[n] = 0

If you say it's simple for unique chars, why dont you make a set of chars from the string first? Set meaning no duplicates. — The Fool, Feb 20 '22 at 20:34
@TheFool That would not word. Take the string `aabc`. If you convert to a set, it'll be `abc`. Then, the combination `aa` could not be generated — pwasoutside, Feb 20 '22 at 20:37
How is abc different from aabc in the way you generate your result? What if I give you only abc as input. Your code should work with all strings as input, no? — The Fool, Feb 20 '22 at 20:39
Here's a simple one: `more_itertools.distinct_combinations('aabc', 2)` :-) — Pychopath, Feb 20 '22 at 20:42
@Pychopath I appreciate this answer, but I'm more so looking for an algorithm/implementation — pwasoutside, Feb 20 '22 at 21:00
Well, you can look at the algorithm/implementation of that function, the source code link is right there at [its documentation](https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.distinct_combinations). (Granted, it might not be simpler, as they probably value efficiency more than simplicity.) — Pychopath, Feb 20 '22 at 21:03
Is that a LeetCode problem? If so, which one? If not, where else is it from? — Kelly Bundy, Feb 20 '22 at 21:19
In Ruby: `"aabc".chars.combination(2).map(&:join).uniq #=> ["aa", "ab", "ac", "bc"]`. I would think Python would have something similar. — Cary Swoveland, Feb 20 '22 at 21:59
@CarySwoveland It does, but that method is inefficient. That's why that more-itertools function exists. — Pychopath, Feb 20 '22 at 22:14
@Pychopath, that's good to know. Also, the readability of `more_itertools.distinct_combinations('aabc', 2)` could hardly be improved upon. — Cary Swoveland, Feb 20 '22 at 23:10
@CarySwoveland For sake of completeness: the Python version of your Ruby is `set(map(''.join, itertools.combinations('aabc', 2)))`. But both have the efficiency issue, for example for `s = "aaaaaaaaaabbbbbbbbbb"` and `k = 10` we produce 184756 combinations, only to then throw all but 11 of them away. — Pychopath, Feb 20 '22 at 23:59
@Pychopath I think there's an issue with the itertools solution. `more_itertools.distinct_combinations("aabc", 2)` returns `[('a', 'a'), ('a', 'b'), ('a', 'c'), ('b', 'c')]` whereas `more_itertools.distinct_combinations("abac", 2)` returns `[('a', 'b'), ('a', 'a'), ('a', 'c'), ('b', 'a'), ('b', 'c')]`. These should be identical, but the second includes `(b, a)` and `(a, b)` which are duplicates ` — pwasoutside, Feb 21 '22 at 02:17
You'll need to sort the string characters before using `more_itertools.distinct_combinations` to get what you want. The more_itertools documentation says it's equivalent to `set(combinations(iterable))`, and `itertools.combinations` would produce both 'ab' and 'ba' in your second case, since items are treated as distinct based on position, not value. — kcsquared, Feb 21 '22 at 02:25
@kcsquared `Counter(s).elements()` would also work, but I guess you can consider it sorting by an unusual order :-) — Pychopath, Feb 21 '22 at 02:32

pwasoutside · Answer 1 · 2022-02-21T02:39:25.400

I took the recommendation of @Pychopath and checked out the more-itertools approach to the problem. They take a bottom-up DP approach, and I figured I'd translate it back into the top-down approach I originally took. Unfortunately, it requires you to sort the iterable first

class Solution:
    def first_char_idx(self, s, start=0):
        ''''
        Returns first idx and char of string
        aabbc -> [(0, a), (2, b), (4, c)]
        '''
        seen = set()
        return [(idx, c) for idx, c in enumerate(s, start) if c not in seen and not seen.add(c)]
        
    def unique_combos_helper(self, used, idx, k):
        if not k:
            self.res.append(used)
        else:
            for i, c in self.first_char_idx(self.pool[idx:], idx):
                self.unique_combos_helper(used+c, i+1, k-1)

    def unique_combos(self, s, k):
        s = sorted(list(s))
        self.res = []
        self.pool = s
        self.unique_combos_helper("", 0, k)
        return self.res

If it doesn't bother you to regenerate `first_char_idx` on every recursion, why does sorting the array once seem like a defect? — rici, Feb 21 '22 at 22:30

score -1 · Answer 2 · answered Feb 20 '22 at 20:57

-1

A set doesn't allow duplicates, so, you can do it as a list and just convert to a set.

In a list, with duplicates, your result taking all the possibilities will jave duplicates ['aa', 'ab', 'ac', 'ab', 'ac', 'bc'], but if you convert do a set, it will not:

x='aabc'
b=list()
for i in range(len(x)-1):
  for j in range(i+1,len(x)):
    b.append(x[i]+x[j])
b=set(b)

answered Feb 20 '22 at 20:57

Ivo Tebexreni

155
1
2
15

1

This is an ok answer, but what if we want to generalize and instead of strings use lists of numbers: `[1, 1, 2, 3]`. Ie, the solution would be `[[1, 1], [1, 2], [1, 3], [2, 3]]`. Sets cannot contain lists, so this solution wouldn't work. – pwasoutside Feb 20 '22 at 20:58
1

@pwasoutside They could use tuples instead, sets can contain those. The tuple elements and thus the input elements must be hashable for that, but your own solution also already assumes they are (otherwise you can't put them into your `Counter`). I'd say the bigger issues are that this is only for k=2 and that it's inefficient (might produce many duplicates and then throw most of them away). – Pychopath Feb 21 '22 at 00:05

score -1 · Answer 3 · answered Feb 20 '22 at 21:04

-1

Change the length according to your needs:

import itertools

list='aabc'
lenght=2

result=[''.join(p) for p in itertools.product(list, repeat=lenght)][::-1]

answered Feb 20 '22 at 21:04

razimbres

4,715
5
23
50

Generate unique combinations of length k from string

3 Answers3