-3

I want to remove all the words that contain a specific substring.

Sentence = 'walking my dog https://github.com/'
substring = 'http'

# Remove all words that start with the substring
#...

result = 'walking my dog'
ASSILI Taher
  • 1,210
  • 2
  • 9
  • 11
Benjamin Kolber
  • 146
  • 1
  • 8
  • `re.sub` with a word boundary is the way to go. – Mike Jun 01 '19 at 20:41
  • Possible duplicate of [Do Python regular expressions from the re module support word boundaries (\b)?](https://stackoverflow.com/questions/3995034/do-python-regular-expressions-from-the-re-module-support-word-boundaries-b) – Mike Jun 01 '19 at 20:41
  • What code have you written so far? Why doesn't your solution work? Which part of the task is a problem for you? – boreq Jun 01 '19 at 21:09

2 Answers2

1

This respects the original spacing in the string without having to fiddle around too much.

import re
string = "a suspect http://string.com   with spaces before and after"
starts = "http"
re.sub(f"\\b{starts}[^ ]*[ ]+", "", string)
'a suspect with spaces before and after'
Mike
  • 828
  • 8
  • 21
0

There is a simple approach that we can use for this.

  1. Split the sentence into words
  2. Find all the works that
  3. Check if that word contains the substring and remove it
  4. Join back the remaining words.
>>> sentence = 'walking my dog https://github.com/'
>>> substring = 'http'
>>> f = lambda v, w: ' '.join(filter(lambda x: w not in x, v.split(' ')))
>>> f(sentence, substring)
'walking my dog'

Explanation:

1. ' '.join(
2.   filter(
3.     lambda x: w not in x,
4.     v.split(' ')   
6.  )
7. )

1 stars with a join. 2 is for filtering all the elements from 4, which splits the string into words. The condition to filter is substring not in word. The not in does a O(len(substring) * len(word)) complexity comparison.

Note: The only step that can be sped up is line 3. The fact that you are comparing words to a constant string, you can use Rabin-Karp String Matching to find the string in O(len(word)) or Z-Function to find the string in O(len(word) + len(substring))

prateeknischal
  • 752
  • 4
  • 12