replace multiple words in a dataframe

Question

I would like to replace words as described here but for a column in a dataframe. I also want to keep the original column and other columns in the dataframe.

a = ["isn't", "can't"]
b = ["is not", "cannot"]

for line in df['text']:
    for a1, b1 in zip(a, b):
        line = line.replace(a1, b1)
    df['text1'].write(line)

TypeError: expected str, bytes or os.PathLike object, not Series

Input dataframe

ID    text      
1     isn't bad
2     can't play

Output

ID    text          text1
1     isn't bad     is not bad
2     can't play    cannot play

Please help. Thank you.

Likely you can do this with `str.replace` or just `.replace`, but we really would need a minimum reproducible example with copy/pastable sample input data as well as expected output: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — David Erickson, Sep 13 '20 at 22:55

David Erickson · Accepted Answer · 2020-09-13T23:26:28.220

4

If you have two lists a and b, then this would be the best way to .replace the values by passing regex=True:

a = ["isn't", "can't"]
b = ["is not", "cannot"]
# df=pd.read_clipboard('\s\s+')
df['text1'] = df['text'].replace(a,b,regex=True)
df
Out[68]: 
   ID        text        text1
0   1   isn't bad   is not bad
1   2  can't play  cannot play

Please note that a and b should be the same length. If it is just a small list, this technique is fine, but if it is a larger list, you would probably want to build a dictionary.

edited Sep 13 '20 at 23:26

answered Sep 13 '20 at 23:21

David Erickson

16,433
2
19
35

Some words are not replaced. I am not sure what happened – Jason Sep 14 '20 at 00:10
I find out. It is due to the font difference. – Jason Sep 14 '20 at 00:16

score 2 · Answer 2 · answered Sep 13 '20 at 23:31

Using the apply method on dataframe column in conjuction with lambda function you can acheive that, like this:

import pandas as pd
a = ["isn't", "can't"]
b = ['is not', 'cannot']

df = pd.DataFrame({'id': [1,2], 'text': ["isn't bad", "can't play"]})
df['a'], df['b'] = a,b
print(df.head())

The dataframe looks like this:

   id        text      a       b
0   1   isn't bad  isn't  is not
1   2  can't play  can't  cannot

You can now do apply on this dataframe like this:

df['vals'] = pd.Series(map(lambda x,y,z: x.replace(y, z), list(df.text), list(df.a), list(df.b)))
print(df.head())

Final output:

   id        text      a       b         vals
0   1   isn't bad  isn't  is not   is not bad
1   2  can't play  can't  cannot  cannot play

You can consider vals column for your analysis or extract only the required columns.

score 0 · Answer 3 · answered Sep 13 '20 at 23:45

Well you can use a lookup table to change your words;

import pandas as pd

dict = {
    'text':["isn't bad", "can't play"]
}
table = {
    "isn't":"is not",
    "can't":"cannot"
}

df = pd.DataFrame(dict)
revised_text = []
for text in dict['text']:
    words = text.split()
    for word in words:
        if word in table.keys():
            revised_text.append(text.replace(word, table[word]))

df['text1'] = revised_text
print(df)

rhug123 · Answer 4 · 2020-09-14T00:08:47.730

Here is an option.

df['text1'] = df['text']
for i in range(len(a)):
    df['text1'] = df['text1'].str.replace(a[i],b[i])

Here is another way which does not involve iterating.

replacedict = {"isn't":"is not",
          "can't":"cannot"}
text = df['text']
df = df.assign(text=df['text'].str.split(' ')).explode('text').replace(replacedict).groupby('id').agg({'text':lambda x: ' '.join(x)}).reset_index()
df['text1'] = df['text']
df['text'] = text

replace multiple words in a dataframe

4 Answers4