1

I'm stripping punctuation from strings contained within a Pandas dataframe. For example:

import pandas as pd
df = pd.DataFrame(data = [['a.b', 'c_d', 'e^f'],['g*h', 'i@j', 'k&l']], 
                  columns = ['column 1', 'column 2', 'column 3'])

I've succeeded in stripping punctuation within a column using list comprehension:

import string
df_nopunct = [line.translate(str.maketrans('', '', string.punctuation)) 
    for line in list(df['column 1'])]

# ['ab', 'gh']

But what I'd really like to do is strip punctuation across the entire dataframe, saving this as a new dataframe.

If I try the same approach on the entire dataframe, it seems to just return a list of my column names:

df_nopunct = [line.translate(str.maketrans('', '', string.punctuation)) 
    for line in list(df)]

# ['column 1', 'column 2', 'column 3']

Should I iterate line.translate(str.maketrans('', '', string.punctuation)) across columns, or is there a simpler way to accomplish this?

I've looked at the detailed answer about how to strip punctuation but it looks like that article deals with stripping from a single string, rather than across an entire dataframe.

Michael Boles
  • 369
  • 5
  • 15

1 Answers1

1

You could do direct df.replace as follows

import string
df_trans = df.replace('['+string.punctuation+']', '', regex=True)

Out[766]:
  column 1 column 2 column 3
0       ab       cd       ef
1       gh       ij       kl

If you prefer using translate, use dict comprehension with str.translate on each column and construct new dataframe

import string
trans = str.maketrans('', '', string.punctuation)
df_trans = pd.DataFrame({col: df[col].str.translate(trans) for col in df})

Out[746]:
  column 1 column 2 column 3
0       ab       cd       ef
1       gh       ij       kl
Andy L.
  • 24,909
  • 4
  • 17
  • 29
  • Thanks -- `df.replace` works great. But for some reason, when attempting the second approach `pd.DataFrame({col: df[col].str.translate(trans) for col in df})` on my real data, I'm seeing `AttributeError: 'DataFrame' object has no attribute 'str'`. It seems like this might have to do with how I've [named my columns](https://stackoverflow.com/questions/51502263/pandas-dataframe-object-has-no-attribute-str)? Not sure I fully understand. I think I'll stick with `df.replace`. – Michael Boles Nov 27 '19 at 23:44
  • @MichaelBoles: you are welcome. Just use `df.replace`. It is a recommended way. The `str.translate` is an alternative way in case you want to explore more on `str` accessor methods. As in your case, there is something in your dataframe tripped it. As it is not a recommended way, it is not worth to pulling hairs over it :D – Andy L. Nov 27 '19 at 23:57