Get all substrings from string in python?

Question

What is the fastest way to find all the substrings in a string without using any modules and without making duplicates


def lols(s):

    if not s:
        return 0
    lst = [] 
    for i in range (len(s)):
        for j in range(i , len(s)+1):
            if not s[i:j] :
                pass
            elif len(s[i:j]) == len(set(s[i:j])):
                lst.append(s[i:j])

    res =  (max(lst , key=len))


s = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
s = s*100

lols(s)

this function works fine with strings smaller than 1000, but it freezes when the example string is used and a time limit is exceeded for large strings

Problem

hilssu · Answer 1 · 2021-04-30T09:18:00.593

I recommend that you don't try this with super long strings like your s!

If you're able to install nltk so that it works (I just recently had a problem with that but managed to solve it by installing it to the Windows Sandbox, see here: Python3: Could not find "vcomp140.dll (or one of its dependencies)" while trying to import nltk), then this is one way to do it

from nltk import ngrams

def lols(s):
    lst = []
    for i in range(1, len(s)):
        lst.extend([''.join(j) for j in ngrams(s, i)])
    lst.append(s)
    return lst

If not, you can do this instead of "from nltk import ngrams".

import collections, itertools

def ngrams(words, n):
    """Edited from https://stackoverflow.com/questions/17531684/n-grams-in-python-four-five-six-grams"""
    d = collections.deque(maxlen=n)
    d.extend(words[:n])
    words = words[n:]
    answer = []
    for window, word in zip(itertools.cycle((d,)), words):
        answer.append(''.join(window))
        d.append(word)
    answer.append(''.join(window))
    return answer

Demo:

>>> lols('username')
['u', 's', 'e', 'r', 'n', 'a', 'm', 'e', 'us', 'se', 'er', 'rn', 'na', 'am', 'me', 'use', 'ser', 'ern', 'rna', 'nam', 'ame', 'user', 'sern', 'erna', 'rnam', 'name', 'usern', 'serna', 'ernam', 'rname', 'userna', 'sernam', 'ername', 'usernam', 'sername', 'username']

This is a problem on leetcode and you are not allowed to use libraries to solve it — packetsniffer, Apr 30 '21 at 09:19
Whoops, I noticed that only after posting my first answer. As far as I know, "collections" and "itertools" are part of the standard library and thus there should be no python environments that don't contain them, but in any case, if you can implement "ngrams" in pure python (that is, substrings of a certain length), then you could use that instead. — hilssu, Apr 30 '21 at 09:23
There is a pure python way to generate ngrams presented in here: https://albertauyeung.github.io/2018/06/03/generating-ngrams.html though you'll have to remove some unnecessary lines and replace the "token" with "s" in the line before the return statement (and rename the function to "ngrams" or call "generate_ngrams" in the function I wrote instead). — hilssu, Apr 30 '21 at 09:34

score 0 · Answer 2 · answered Apr 30 '21 at 13:20

Maybe your function performance slows down like n2 or n!. (O(n2) or O(n!)) Or the memory is tight. About the maximum size of your string which you can print in stdout using print function, since you are have to pass your text as a python object to print function and since the max size of your variable is depend on your platform it could be 2**31 - 1 on a 32-bit platform and 2* *63 - 1 on a 64-bit platform.

for more in information go to sys.maxsize

Get all substrings from string in python?

2 Answers2