0

I have a list

forbidden_patterns=['Word1','Word2','Word3','\d{4}']

and a string :

string1="This is Word1 a list thatWord2 I'd like to 2016 be readableWord3"

What is the way to have string1 to have all the patterns and words defined in forbidden_patterns removed so it ends with :

clean_string="This is a list that I'd like to be readable"

The \d{4} is to remove the year pattern which in this case is 2016

List comprehension are very welcome

dlewin
  • 1,673
  • 1
  • 19
  • 37

2 Answers2

2
import re

new_string = string1
for word in forbidden_words:
    new_string = re.sub(word, '', new_string)

Your new_string would be the one you want. Though, it's a bit long and removing some words leaving you with 2 spaces as This is a list that I'd like to be readable

WiNloSt
  • 74
  • 1
  • 4
2

Here you are:

import re

forbidden_patterns = ['Word1', 'Word2', 'Word3', '\d{4}']

string = "This is Word1 a list thatWord2 I'd like to 2016 be readableWord3"

for pattern in forbidden_patterns:
    string = ''.join(re.split(pattern, string))

print(string)

Essentially, this code goes through each of the patterns in forbidden_patterns, splits string using that particular pattern as a delimiter (which removes the delimiter, in this case the pattern, from the string), and joins it back together into a string for the next pattern.

EDIT

To get rid of the extra spaces, put the following line as the first line in the for-loop:

string = ''.join(re.split(r'\b{} '.format(pattern), string))

This line checks to see if the pattern is a whole word, and if so, removes that word and one of the spaces. Make sure that this line goes above string = ''.join(re.split(pattern, string)), which is "less specific" than this line.

pzp
  • 6,249
  • 1
  • 26
  • 38