0

I would like to replace words as described here but for a column in a dataframe. I also want to keep the original column and other columns in the dataframe.

a = ["isn't", "can't"]
b = ["is not", "cannot"]

for line in df['text']:
    for a1, b1 in zip(a, b):
        line = line.replace(a1, b1)
    df['text1'].write(line)

TypeError: expected str, bytes or os.PathLike object, not Series

Input dataframe

ID    text      
1     isn't bad
2     can't play

Output

ID    text          text1
1     isn't bad     is not bad
2     can't play    cannot play

Please help. Thank you.

David Erickson
  • 16,433
  • 2
  • 19
  • 35
Jason
  • 467
  • 2
  • 4
  • 12
  • 1
    Likely you can do this with `str.replace` or just `.replace`, but we really would need a minimum reproducible example with copy/pastable sample input data as well as expected output: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – David Erickson Sep 13 '20 at 22:55
  • 1
    @DavidErickson: I have modified the question. – Jason Sep 13 '20 at 23:04

4 Answers4

4

If you have two lists a and b, then this would be the best way to .replace the values by passing regex=True:

a = ["isn't", "can't"]
b = ["is not", "cannot"]
# df=pd.read_clipboard('\s\s+')
df['text1'] = df['text'].replace(a,b,regex=True)
df
Out[68]: 
   ID        text        text1
0   1   isn't bad   is not bad
1   2  can't play  cannot play

Please note that a and b should be the same length. If it is just a small list, this technique is fine, but if it is a larger list, you would probably want to build a dictionary.

David Erickson
  • 16,433
  • 2
  • 19
  • 35
2

Using the apply method on dataframe column in conjuction with lambda function you can acheive that, like this:

import pandas as pd
a = ["isn't", "can't"]
b = ['is not', 'cannot']

df = pd.DataFrame({'id': [1,2], 'text': ["isn't bad", "can't play"]})
df['a'], df['b'] = a,b
print(df.head())

The dataframe looks like this:

   id        text      a       b
0   1   isn't bad  isn't  is not
1   2  can't play  can't  cannot

You can now do apply on this dataframe like this:

df['vals'] = pd.Series(map(lambda x,y,z: x.replace(y, z), list(df.text), list(df.a), list(df.b)))
print(df.head())

Final output:

   id        text      a       b         vals
0   1   isn't bad  isn't  is not   is not bad
1   2  can't play  can't  cannot  cannot play

You can consider vals column for your analysis or extract only the required columns.

Anup Tiwari
  • 474
  • 2
  • 5
0

Well you can use a lookup table to change your words;

import pandas as pd

dict = {
    'text':["isn't bad", "can't play"]
}
table = {
    "isn't":"is not",
    "can't":"cannot"
}

df = pd.DataFrame(dict)
revised_text = []
for text in dict['text']:
    words = text.split()
    for word in words:
        if word in table.keys():
            revised_text.append(text.replace(word, table[word]))

df['text1'] = revised_text
print(df)
sadbro
  • 112
  • 7
0

Here is an option.

df['text1'] = df['text']
for i in range(len(a)):
    df['text1'] = df['text1'].str.replace(a[i],b[i])

Here is another way which does not involve iterating.

replacedict = {"isn't":"is not",
          "can't":"cannot"}
text = df['text']
df = df.assign(text=df['text'].str.split(' ')).explode('text').replace(replacedict).groupby('id').agg({'text':lambda x: ' '.join(x)}).reset_index()
df['text1'] = df['text']
df['text'] = text
rhug123
  • 7,893
  • 1
  • 9
  • 24