-1

I'm new to Python. My code for reducing a list of strings takes a long time to execute. It functions to find: only those strings in a list which aren't partial matches of other strings in the same list. Is there a more efficient form of this code?

The code below seemed to work better than the following: any(item1 for item in my_list1 if item1.startswith(item1) or item1.endswith(item1)) from a related question (Python list lookup with partial match). Am I using any wrong?

Right now, I can only find partial matches in my_list1 which begin or end other entries in my_list1. I'd like to find ALL partial matches, even center matches.

#My_list1 could be:
my_list=['abcd', 'abcde', 'abcdef', 'bcd', 'bcde', 'bcdef']

for item1 in my_list1:
    icount=0    
    for item2 in my_list1:
        if item2.startswith(item1): 
            icount+=1
        if icount>1:
            break
    if icount==1:
       my_list2.append(item1)
       print item1

Desired my_list2 would be: ['abcdef']

When I change the line

if item2.startswith(item1):

to

if item2 in item1:

I go from having thousands of results in my_list2 with few redundancies to zero results in my_list2

Community
  • 1
  • 1

1 Answers1

0

You could sort the list by the length of the entries prior to searching it. That way, you don't need to search the entire list for partial matches as you iterate over each entry, since you know any entries prior to the current entry won't be a partial match, because they're too short. Like this:

l = ['abcd', 'abcde', 'abcdef', 'bcd', 'bcde', 'bcdef']
s_l = sorted(l, key=len)
print("Sorted list is {}".format(s_l)
out = [val for i,val in enumerate(s_l)
         if not any(val in ent for ent in s_l[i+1:])]
print out

Output:

Sorted list is ['bcd', 'abcd', 'bcde', 'abcde', 'bcdef', 'abcdef']
['abcdef']

This piece might be confusing:

if not any(val in ent for ent in s_l[i+1:])

It's iterating over all indices after the current index (represented by s_l[i+1:]), and checking to see if the val substring is contained in any of the strings at each index (represented by val in ent). If any of those indices return True for the val in ent test, the any call will return True. So we're saying, add val to our out list if val is not a sub-string of any of the strings contained in s_l, starting after the current s_l index.

dano
  • 91,354
  • 19
  • 222
  • 219