Hi I have a pandas dataframe and a text file that look a little like this:
df:
+----------------------------------+
| Description |
+----------------------------------+
| hello this is a great test $5435 |
| this is an432 entry |
| ... |
| entry number 43535 |
+----------------------------------+
txt:
word1
word2
word3
...
wordn
The descriptions are not important.
I want to go through each row in the df split by ' ' and for each word if the word is in text then keep it otherwise delete it.
Example:
Suppose my text file looks like this
hello
this
is
a
test
and a description looks like this
"hello this is a great test $5435"
then the output would be hello this is a test
because great
and $5435
are not in text.
I can write something like this:
def clean_string(rows):
for row in rows:
string = row.split()
cleansed_string = []
for word in string:
if word in text:
cleansed_string.append(word)
row = ' '.join(cleansed_string)
But is there a better way to achieve this?