0

I have a pandas dataframe df, where one column has a string in it:

columnA
'PSX - Judge A::PSK-Ama'
'VSC - Jep::VSC-Da'
'VSO - Jep::VSO-Da'
...

And I have another dataframe, where I have the new strings:

old new
PSX PCC
VSO VVV

My desired outcome would be:

columnA
'PCC - Judge A::PCC-Ama'
'VSC - Jep::VSC-Da'
'VVV - Jep::VVV-Da'
...

My idea would be to write:

import re
df['columnA'] = df.replace('PSX', 'PCC', regex=True)
df['columnA'] = df.replace('VSO', 'VVV', regex=True)

for two replacements it is ok, but how to do it for severel replacements? Is there a smarter way to do it?

The dataframe you get here (thx to Daniel):

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])
replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])
PV8
  • 5,799
  • 7
  • 43
  • 87

3 Answers3

1
for row in df_map.iterrows():
    df['columnA'] = df.replace(row[0], row[1], regex=True)

Where df_map is your mapping DataFrame.

rbcvl
  • 406
  • 3
  • 13
1

You could use the fact that the replacement parameter can be a function:

import pandas as pd

df = pd.DataFrame(data=['PSX - Judge A::PSK-Ama',
                        'VSC - Jep::VSC-Da',
                        'VSO - Jep::VSO-Da'], columns=['columnA'])

replace = pd.DataFrame(data=[['PSX', 'PCC'],
                             ['VSO', 'VVV']], columns=['old', 'new'])

lookup = dict(zip(replace.old, replace.new))


def repl(w, lookup=lookup):
    return lookup.get(w.group(), w.group())


df['columnA'] = df['columnA'].str.replace('\w+', repl)

print(df)

Output

                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da

The idea is to extract the words in columnA and if it matches one in lookup replace it. This is inspired by this answer, in which bench-marking shows this to be the more competitive approach.

Dani Mesejo
  • 61,499
  • 6
  • 49
  • 76
1

You can make a "replacement dictionary" out of your second dataframe and then iterate over the keys and values and meanwhile use str.replace. This solution should be quite fast:

replacements = dict(zip(df2['old'], df2['new']))

for k, v in replacements.items():
    df['columnA'] = df['columnA'].str.replace(k, v)
                  columnA
0  PCC - Judge A::PSK-Ama
1       VSC - Jep::VSC-Da
2       VVV - Jep::VVV-Da
Erfan
  • 40,971
  • 8
  • 66
  • 78