Removing a string that starts with a substring

Question

I want to remove all the words that contain a specific substring.

Sentence = 'walking my dog https://github.com/'
substring = 'http'

# Remove all words that start with the substring
#...

result = 'walking my dog'

Possible duplicate of [Do Python regular expressions from the re module support word boundaries (\b)?](https://stackoverflow.com/questions/3995034/do-python-regular-expressions-from-the-re-module-support-word-boundaries-b) — Mike, Jun 01 '19 at 20:41
What code have you written so far? Why doesn't your solution work? Which part of the task is a problem for you? — boreq, Jun 01 '19 at 21:09

Mike · Answer 1 · 2019-06-01T20:50:32.620

1

This respects the original spacing in the string without having to fiddle around too much.

import re
string = "a suspect http://string.com   with spaces before and after"
starts = "http"
re.sub(f"\\b{starts}[^ ]*[ ]+", "", string)
'a suspect with spaces before and after'

edited Jun 01 '19 at 20:50

answered Jun 01 '19 at 20:21

Mike

828
8
21

score 0 · Answer 2 · answered Jun 01 '19 at 21:07

There is a simple approach that we can use for this.

Split the sentence into words
Find all the works that
Check if that word contains the substring and remove it
Join back the remaining words.

>>> sentence = 'walking my dog https://github.com/'
>>> substring = 'http'
>>> f = lambda v, w: ' '.join(filter(lambda x: w not in x, v.split(' ')))
>>> f(sentence, substring)
'walking my dog'

Explanation:

1. ' '.join(
2.   filter(
3.     lambda x: w not in x,
4.     v.split(' ')   
6.  )
7. )

1 stars with a join. 2 is for filtering all the elements from 4, which splits the string into words. The condition to filter is substring not in word. The not in does a O(len(substring) * len(word)) complexity comparison.

Note: The only step that can be sped up is line 3. The fact that you are comparing words to a constant string, you can use Rabin-Karp String Matching to find the string in O(len(word)) or Z-Function to find the string in O(len(word) + len(substring))

Removing a string that starts with a substring

2 Answers2