Removing any pattern (word or regex) defined in a list from a string

Question

I have a list

forbidden_patterns=['Word1','Word2','Word3','\d{4}']

and a string :

string1="This is Word1 a list thatWord2 I'd like to 2016 be readableWord3"

What is the way to have string1 to have all the patterns and words defined in forbidden_patterns removed so it ends with :

clean_string="This is a list that I'd like to be readable"

The \d{4} is to remove the year pattern which in this case is 2016

List comprehension are very welcome

Possible duplicate of [How to remove symbols from a string with Python?](http://stackoverflow.com/questions/875968/how-to-remove-symbols-from-a-string-with-python) — , Feb 12 '16 at 16:05
@JETM : the potential duplicate is only based on re.sub where my question is about a list of patterns — dlewin, Feb 12 '16 at 16:18

score 2 · Answer 1 · answered Feb 12 '16 at 16:09

import re

new_string = string1
for word in forbidden_words:
    new_string = re.sub(word, '', new_string)

Your new_string would be the one you want. Though, it's a bit long and removing some words leaving you with 2 spaces as This is a list that I'd like to be readable

pzp · Accepted Answer · 2016-02-12T16:36:42.653

Here you are:

import re

forbidden_patterns = ['Word1', 'Word2', 'Word3', '\d{4}']

string = "This is Word1 a list thatWord2 I'd like to 2016 be readableWord3"

for pattern in forbidden_patterns:
    string = ''.join(re.split(pattern, string))

print(string)

Essentially, this code goes through each of the patterns in forbidden_patterns, splits string using that particular pattern as a delimiter (which removes the delimiter, in this case the pattern, from the string), and joins it back together into a string for the next pattern.

EDIT

To get rid of the extra spaces, put the following line as the first line in the for-loop:

string = ''.join(re.split(r'\b{} '.format(pattern), string))

This line checks to see if the pattern is a whole word, and if so, removes that word and one of the spaces. Make sure that this line goes above string = ''.join(re.split(pattern, string)), which is "less specific" than this line.

Removing any pattern (word or regex) defined in a list from a string

2 Answers2