I have a column that I am trying to clean by removing all words before a specific word.
data = ['The text is interesting but short' ,'The text is interesting but short' ,'The text is interesting but short' ,'The text is interesting but short' ,'The text is interesting but short' ,'The text is interesting but short' ]
df = pd.DataFrame(data, columns=['Text'])
I would like to remove all the words before "interesting" in each row of the column "Text".
I found that it is possible to do it using regular expression and it is doing exactly what I want when applied to one row (as a string) but I can't figure out how to apply to each row of the column.
Below is the code that I found to clean a row:
import re
date_div = "The text is interesting but short"
up_to_word = "is"
rx_to_first = r'^.*?{}'.format(re.escape(up_to_word))
print(re.sub(rx_to_first, '', date_div, flags=re.DOTALL).strip())
How to apply it to each row of the column please?