4

I'm looking for the most efficient way to reduce a given list based off of substrings already in the list.

For example

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

would be reduced to:

mylist = ['abcd','qrs']

because both 'abcd' and 'qrs' are the smallest substring of other elements in that list. I was able to do this with about 30 lines of code, but I suspect there is a crafty one-liner out there..

Chris Hall
  • 871
  • 6
  • 13
  • 21
  • 2
    At a high level, it's simple: build a [radix tree](https://en.wikipedia.org/wiki/Radix_tree), then take the direct children of the root (that represent actual elements; a node is just a maximal common prefix of its desendents). In practice, you'll need to track down a decent implementation of a radix tree. [This question](https://stackoverflow.com/questions/4707296/are-there-any-radix-patricia-critbit-trees-for-python) might help you start. – chepner Jun 13 '17 at 16:59
  • 1
    can you provide more complex example for tests? – Azat Ibrakov Jun 13 '17 at 17:05
  • Are substrings always supposed to be prefixes? – DYZ Jun 13 '17 at 17:05
  • 1
    yes, they are always meant to be prefixes – Chris Hall Jun 13 '17 at 17:08

4 Answers4

3

this seems to be working (but not so efficient i suppose)

def reduce_prefixes(strings):
    sorted_strings = sorted(strings)
    return [element
            for index, element in enumerate(sorted_strings)
            if all(not previous.startswith(element) and
                   not element.startswith(previous)
                   for previous in sorted_strings[:index])]

tests:

>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu'])
['abcd', 'qrs']
>>>reduce_prefixes(['abcd', 'abcde', 'abcdef',
                    'qrs', 'qrst', 'qrstu',
                    'gabcd', 'gab', 'ab'])
['ab', 'gab', 'qrs']
Azat Ibrakov
  • 9,998
  • 9
  • 38
  • 50
0

One solution is to iterate over all the strings and split them based on if they had different characters, and recursively apply that function.

def reduce_substrings(strings):
    return list(_reduce_substrings(map(iter, strings)))

def _reduce_substrings(strings):
    # A dictionary of characters to a list of strings that begin with that character
    nexts = {}
    for string in strings:
        try:
            nexts.setdefault(next(string), []).append(string)
        except StopIteration:
            # Reached the end of this string. It is the only shortest substring.
            yield ''
            return
    for next_char, next_strings in nexts.items():
        for next_substrings in _reduce_substrings(next_strings):
            yield next_char + next_substrings

This splits it into a dictionary based on the character, and tries to find the shortest substring out of those that it split into a different list in the dictionary.

Of course, because of the recursive nature of this function, a one-liner wouldn't be possible as efficiently.

Artyer
  • 31,034
  • 3
  • 47
  • 75
0

Probably not the most efficient, but at least short:

mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']

outlist = []
for l in mylist:
    if any(o.startswith(l) for o in outlist):
        # l is a prefix of some elements in outlist, so it replaces them
        outlist = [ o for o in outlist if not o.startswith(l) ] + [ l ]
    if not any(l.startswith(o) for o in outlist):
        # l has no prefix in outlist yet, so it becomes a prefix candidate
        outlist.append(l)

print(outlist)
Błotosmętek
  • 12,717
  • 19
  • 29
-1

Try this one:

import re
mylist = ['abcd','abcde','abcdef','qrs','qrst','qrstu']
new_list=[]
for i in mylist:
    if re.match("^abcd$",i):
        new_list.append(i)
    elif re.match("^qrs$",i):
        new_list.append(i)
print(new_list)
#['abcd', 'qrs']
dildeolupbiten
  • 1,314
  • 1
  • 15
  • 27