I am reading a csv file for applying NLP and I am trying to pre-process the data. I have received data from an online forum, therefore, there are quotes on it. How can remove them? As an example;
a='[b]Re:[/b]
[quote="xxx"] How can I do that blah blah xxx [/quote]
Hello xxx, I will tell you how you can do it blah blah blah.'
I want the form like below;
a='Hello xxx, I will tell you how you can do it blah blah blah.'
I wanna regex that detects [quote=" and started to delete until it sees [/quote]. Is this possible?
I have tried this, but it did not work.
def quotes(text):
return re.sub('\[([^\]=]+)(?:=[^\]]+)?\].*?\[\/\\1\]', '', text)
data['message'] = data['message'].apply(quotes)