I want to remove all the words that contain a specific substring.
Sentence = 'walking my dog https://github.com/'
substring = 'http'
# Remove all words that start with the substring
#...
result = 'walking my dog'
I want to remove all the words that contain a specific substring.
Sentence = 'walking my dog https://github.com/'
substring = 'http'
# Remove all words that start with the substring
#...
result = 'walking my dog'
This respects the original spacing in the string without having to fiddle around too much.
import re
string = "a suspect http://string.com with spaces before and after"
starts = "http"
re.sub(f"\\b{starts}[^ ]*[ ]+", "", string)
'a suspect with spaces before and after'
There is a simple approach that we can use for this.
sentence
into words substring
and remove it>>> sentence = 'walking my dog https://github.com/'
>>> substring = 'http'
>>> f = lambda v, w: ' '.join(filter(lambda x: w not in x, v.split(' ')))
>>> f(sentence, substring)
'walking my dog'
Explanation:
1. ' '.join(
2. filter(
3. lambda x: w not in x,
4. v.split(' ')
6. )
7. )
1
stars with a join. 2
is for filtering all the elements from 4
, which splits the string into words. The condition to filter is substring not in word
. The not in
does a O(len(substring) * len(word))
complexity comparison.
Note: The only step that can be sped up is line 3
. The fact that you are comparing words to a constant string, you can use Rabin-Karp String Matching
to find the string in O(len(word))
or Z-Function
to find the string in O(len(word) + len(substring))