10

Currently I am using the following code to make replacements which is a little cumbersome:

df1['CompanyA'] = df1['CompanyA'].str.replace('.','')
df1['CompanyA'] = df1['CompanyA'].str.replace('-','')
df1['CompanyA'] = df1['CompanyA'].str.replace(',','')
df1['CompanyA'] = df1['CompanyA'].str.replace('ltd','limited')
df1['CompanyA'] = df1['CompanyA'].str.replace('&','and')
df1['Address1A'] = df1['Address1A'].str.replace('.','')
df1['Address1A'] = df1['Address1A'].str.replace('-','')
df1['Address1A'] = df1['Address1A'].str.replace('&','and')
df1['Address1A'].str.replace(r'\brd\b', 'road')
df1['Address2A'] = df1['Address2A'].str.replace('.','')
df1['Address2A'] = df1['Address2A'].str.replace('-','')
df1['Address2A'] = df1['Address2A'].str.replace('&','and')
df1['Address2A'].str.replace(r'\brd\b', 'road')

In order to make changing on the fly easier my ideal scenario would be something like:

df1['CompanyA'] = df1['CompanyA'].str.replace(('&','and'), ('.', ''), ('-','')....)
df1['Address1A'] = df1['Address1A'].str.replace(('&','and'), ('.', ''), ('-','')....)
df1['Address2A'] = df1['Address2A'].str.replace(('&','and'), ('.', ''), ('-','')....)

This is so I could just input/change what I wanted to replace for a particular column without having to adjust multiple lines of code.

Is this possible at all?

Georgy
  • 12,464
  • 7
  • 65
  • 73
Manesh Halai
  • 241
  • 2
  • 3
  • 10

5 Answers5

25

You can create a dictionary and pass it to the function replace() without needing to chain or name the function so many times.

replacers = {',':'','.':'','-':'','ltd':'limited'} #etc....
df1['CompanyA'] = df1['CompanyA'].replace(replacers)
Celius Stingher
  • 17,835
  • 6
  • 23
  • 53
  • 2
    I like this neat answer, also there is nothing wrong with a for loop, more readable and friendly for python rookies – E.Serra Jun 17 '20 at 13:11
  • 5
    Yes it might be more readable and friendly for rookies, however I don't think think passing a simple dictionary is too complex. I'd say dictionaries should be learnt before for loops and pandas together, but that's an opinion of course, thanks for your comment. – Celius Stingher Jun 17 '20 at 13:14
  • 1
    yes, but knowing that you can pass a dictionary to replace has nothing to do with pandas or dictionaries themselves, it is just some internal magic, that was my point, it looks really clean though – E.Serra Jun 17 '20 at 13:16
  • 1
    Amazing! This is exactly the sort of funtionality I was after. – Manesh Halai Jun 17 '20 at 13:27
  • 18
    Can you pass a dictionary to `replace()`? I'm confused here. With Python 3.8.5, evaluating `"abcdefghi".replace({'b':'B', 'g':'G'})` gives `TypeError: replace expected at least 2 arguments, got 1`. Am I missing something? – bitinerant Apr 06 '21 at 18:26
  • 4
    This is pandas replace we are using, not python's built-in replace :) – Celius Stingher Apr 06 '21 at 20:25
4

you could chain the replacings:

df1['CompanyA'] = df1['CompanyA'].str.replace('.','').replace('-','').replace(',','').replace('ltd','limited').replace('&','and')
...
bigbear3001
  • 522
  • 8
  • 19
  • Thanks for sharing. Code ran through without an error but didn't do the changes. ```re.sub()``` worked, but wrote single commands / code lines. I am sure there is a fancier solution (maybe with [translate](https://note.nkmk.me/en/python-str-replace-translate-re-sub/#replace-multiple-different-substrings) ?) But it's a time management question... – Simone Jul 05 '23 at 12:39
2

You can use a dictionary to map the characters for each column:

to_replace = {'.': '',
              ',': '',
              'foo': 'bar'
             }

for k, v in to_replace.items():
    df1['CompanyA'] = df1['CompanyA'].str.replace(k, v)
PApostol
  • 2,152
  • 2
  • 11
  • 21
1

Replace function accepts values as dictionaries as well. You can do something like this:

df1.replace({'CompanyA' : { '&' : 'and', '.': '' , '-': ''}},regex=True)
Raghul Raj
  • 1,428
  • 9
  • 24
  • This only works when & is the only character in the cell. Python will give a value error ```Columns must be same length as key```. – Simone Jul 05 '23 at 10:41
1

most likely you use pd.Dataframe so i suggest to make universal remover

def remover(row, replaces):
    for k,v in replacers.items():
        if k in row:
            row = row.replace(k, v)
    return row      


replacers = {',' : "",
         '.':'',
         '-':'',
         'ltd':'limited'
        }

for column in df.columns:
    df[column] = df[column].apply(lambda row: remover(row, replacers))

or you can specify specific column names to modify

Igor Tischenko
  • 620
  • 9
  • 20