1

The task is to obtain a unique list of substrings in python.

I am currently using the breakup of the problem into 2 parts: obtain a list of all the substrings, followed by obtaining unique substrings.

I am using the below code:

substrings=[]
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        substrings.append(substr)
uniq=[]
for ss in substrings:
    if ss not in uniq:
        uniq.append(ss)

Is there a faster way to solve this problem or a so-called python way of doing it in a more flexible way?

a simple example string being: "aabaa", possible substrings are [a,a,b,a,a,aa,ab,ba,aa,aab,aba,baa,aaba,abaa,aabaa], unique substring which is desired at the end [a,b,aa,ab,ba,aab,aba,baa,aaba,abaa,aabaa]

Ashwin Geet D'Sa
  • 6,346
  • 2
  • 31
  • 59

2 Answers2

1

Use Itertools and Set. Similar to the answer of Edwin but with Itertools, and in one line.

import itertools

uniq=list(set([inputstring[x:y] for x, y in itertools.combinations(
            range(len(inputstring) + 1), r = 2)]))

Basically you use itertools to first find all combinations, then set to find unique elements, then cast to list.

Code for combinations taken from https://www.geeksforgeeks.org/python-get-all-substrings-of-given-string/

Edit for a clearer explanation: First, use combinations to get all pair of indexes corresponding to substrings. The trick here is that itertools.combinations starts with all (0,X) pairs, and then (1,X) pairs, etc. Since we are using combinations and not permutations, we eliminate thus reverse substrings such as (1,0) since they will have been seen in the (0,X) enumerations.

Then simply use these with a list comprehension to get all substrings, use a set to find unique elements, and cast to a list.

Hope that helps

Pixel
  • 101
  • 7
  • Yes, but if you use itertools with permutations it will also return subsets like (1,0), which is undesirable in this case. Since in combinations order doesn't matter, and due to the way itertools.combinations iterates (starts with all (0,X), then (1,X), etc), combinations here will correctly return all substring – Pixel Jul 24 '19 at 14:39
  • combination order matters as the permuted combination need not be the substring – Ashwin Geet D'Sa Jul 24 '19 at 14:45
  • I don't understand that sentence. I'll edit to explain the code more maybe that will help – Pixel Jul 24 '19 at 14:47
0

Use a set instead of a list for the second part. Finding something in a list costs O(n) while in a set it costs O(1) and you don't have to check if its new. Sets won't add something if it is already in the list.

substrings=[]
for i in range(0,len(inputstring)+1):
    for j in range(i+1,len(inputstring)+1):
        substr=inputstring[i:j]
        substrings.append(substr)
uniq=set()
for ss in substrings:
    uniq.add(ss)