0

I can't seem to find a working answer on the following:

Lets say I have the following returned LONG int (or any string for that matter):

longnr = 26731623516244147357200698985770699644500911065919594945175038371437093557337774208653

I can easily split this the following way:

splitnr = [26731,62351,624,41,4735,720,0698,9857,7069,964,450,091,10659,195,94,94517,5038,3714,3709,35,573,37,7,74208653]

Every combination is unique and thus no repeating numbers. I do this in simple code by just iterating over each item and add them to a string until it would find a repeating number. Then just write this string in a list, empty string, add the number just checked and continue until done for all.

What I want is that it will take the least combinations as possible. So first try and find all 10digit unique combinations, then 9 for the remaining, ,8,7 etc

Do I need regex? I can't make this work and some suggested I would need huge patterns.

Next option:

len(set(str(longnr)[0:10])) == len(str(longnr)[0:10])

This works for the first 10 to check if it's unique.

How do I go from here in an optimal way?
The order must be kept like in splitnr.

Alex Hall
  • 34,833
  • 5
  • 57
  • 89
supersoaker2000
  • 333
  • 2
  • 9
  • 3
    This is definitely not the place to use regexes. – Alex Hall May 16 '16 at 23:00
  • That's what I thought, example: http://stackoverflow.com/questions/12870489/regex-to-match-a-word-with-unique-non-repeating-characters/12870549 I can't make this work. – supersoaker2000 May 16 '16 at 23:04
  • 3
    Is this just for fun or is there some practical problem you want to solve? What happens if your string is just `0000000.....000`? Is 10 the maximum length for each substring? Does this have to be perfectly optimal? – Alex Hall May 16 '16 at 23:04
  • 2
    Is it better to have two substrings of length 9 or one of 10 and one of 8? – Alex Hall May 16 '16 at 23:10
  • 1
    Fun and a problem both :) If my string would be '000000' the result would be [0,0,0,0,0,0] If my string would be 'abcfeagd' the result must be [abcfe,agd] The bigger it gets, things change: 'abcfeagdqp' : the result must be [a,bcfeagdqp] It can now make a bigger unique combination. 2*length 9 or 8 + 10 doesn't matter. – supersoaker2000 May 16 '16 at 23:15
  • "the result must be [a,bcfeagdqp] It can now make a bigger unique combination." contradicts "2*length 9 or 8 + 10 doesn't matter." – Alex Hall May 16 '16 at 23:33
  • Maybe I phrased it wrong (no english). What I meant is no repeating chars. The first two chars of '000000' would become 0,0 because the 2nd is a repeat of the first. Would the string be '005000', then it becomes [0,0,50,0,0] because the 0 after 5 is not a repeating of 5. The 4th zero is in '50', thus it is repeated and treated as a new element. – supersoaker2000 May 16 '16 at 23:36
  • @user3386109 You are very right! That combination is also allowed and actually would be the another way and likely the first way to split. – supersoaker2000 May 16 '16 at 23:50
  • Your description is pretty unclear: In the question, you say that every "combination" (I presume you mean "number") must be unique, but then in a comment you say that the solution for "000000" should be [0,0,0,0,0,0], where of course the numbers are not unique. Separately, I'm confused why you say the solution to "abcfeagdqp" *must be* [a,bcfeagdqp] -- aren't [ab,cfeagdqp], [abc,feagdqp], [abcf,eagdqp] and [abcfe,agdqp] all just as good? – j_random_hacker May 17 '16 at 12:01

2 Answers2

1

I was convinced that Edward Peters had the answer. But empirically it seems like all three solutions are equally good:

from random import choice

def edward_peters(string):
    sequences = [[]]
    for end in range(1, len(string) + 1):
        def candidate_sequences():
            for previous_end in range(max(0, end - 10), end):
                substring = string[previous_end:end]
                if len(substring) == len(set(substring)):
                    yield sequences[previous_end] + [substring]

        sequences.append(min(candidate_sequences(), key=len))
    return sequences[-1]

def brendan_abel(long_string):
    if not long_string:
        return []
    cur_i = None
    cur_s = None
    max_i = None
    max_s = None
    for i, s in enumerate(long_string):
        if cur_s is None or s in cur_s:
            if cur_s and (max_s is None or len(cur_s) > len(max_s)):
                max_i = cur_i
                max_s = cur_s
            cur_i = i
            cur_s = [s]
        else:
            cur_s.append(s)
    else:
        if cur_s and (max_s is None or len(cur_s) > len(max_s)):
            max_i = cur_i
            max_s = cur_s
    before = long_string[:max_i]
    after = long_string[max_i + len(max_s):]
    return brendan_abel(before) + [''.join(max_s)] + brendan_abel(after)

def ruud(string):
    result = []
    current = ''
    for c in string:
        if c in current:
            result.append(current)
            current = c
        else:
            current += c
    result.append(current)
    return result

def main():
    while True:
        string = ''.join(choice('1234567890') for _ in range(10000))
        results = [func(string) for func in [edward_peters, brendan_abel, ruud]]
        assert all(''.join(result) == string for result in results)
        assert len(set(map(len, results))) == 1

main()

I can't grasp this intuitively at all. It also seems that Brendan Abel was right that Edward Peters' solution is OP's working backwards, e.g.

print edward_peters(string)
['49', '9', '3849', '3089', '91', '1', '15', '58', '42876', '81926', '6720', '90', '0', '27103', '3064', '436', '6', '862', '2', '201', '7091', '912', '23', '6345', '582', '382', '2', '82457', '64937', '0574', '2743', '983', '4382']
Alex Hall
  • 34,833
  • 5
  • 57
  • 89
0

You could do this using a recursive function, that will work kind of like a divide-and-conquer sorting algorithm. Worst case is going to be O(n^2) (well, it's basically always O(n^2))

  1. Go through each element in original list and try to make the longest possible unique chain, storing the starting index of the max chain (we can cheat a bit here because we know 10 is the absolute maximum using just digits and stop as soon as we find a 10-length chain, but I didn't do this in the example below).

  2. Split the list into 2 separate lists, one before the element list we just found, and one after, and feed them both into the recursive function.

  3. Recombine the results

The function below assumes you're passing it a string, so stringify your long number first, then convert the strings back to numbers afterward.

def func(long_string):
    if not long_string:
        return []
    cur_i = None
    cur_s = None
    max_i = None
    max_s = None
    for i, s in enumerate(long_string):
        if cur_s is None or s in cur_s:
            if cur_s and (max_s is None or len(cur_s) > len(max_s)):
                max_i = cur_i
                max_s = cur_s
            cur_i = i
            cur_s = [s]
        else:
            cur_s.append(s)
    else:
        if cur_s and (max_s is None or len(cur_s) > len(max_s)):
            max_i = cur_i
            max_s = cur_s
    before = long_string[:max_i]
    after = long_string[max_i + len(max_s):]
    return func(before) + [''.join(max_s)] + func(after)

long_number = 1233423732174096361032409234987352
list_of_strings = func(str(long_number).strip('L'))  # strip L for older python versions.
list_of_numbers = map(int, list_of_strings)
Brendan Abel
  • 35,343
  • 14
  • 88
  • 118
  • I had high hopes for this algorithm (was about to code the same thing) but it seems it returned the same number of combinations as OP's. – Alex Hall May 16 '16 at 23:59
  • I'm pretty sure that the simple algorithm proposed in the question always gives an optimal answer. And it runs in *O(n)* time. – user3386109 May 17 '16 at 00:09
  • Except this will choose the *longest* possible chain of unique elements (ie. `[a,bcfeagdqp]` instead of `[abcfe, agdqp]`) – Brendan Abel May 17 '16 at 00:10
  • Ahh, I see what you're doing. But the OP only seems concerned about minimizing the number of chains. – user3386109 May 17 '16 at 00:24
  • @BrendanAbel I added an answer comparing the three solutions. Also note that your `else` block after the `for` always executes since there's no `break`. – Alex Hall May 17 '16 at 09:59