58

Tring to remove the commas and dollars signs from the columns. But when I do, the table prints them out and still has them in there. Is there a different way to remove the commans and dollars signs using a pandas function. I was unuable to find anything in the API Docs or maybe i was looking in the wrong place

 import pandas as pd
    import pandas_datareader.data as web

players = pd.read_html('http://www.usatoday.com/sports/mlb/salaries/2013/player/p/')


df1 = pd.DataFrame(players[0])


df1.drop(df1.columns[[0,3,4, 5, 6]], axis=1, inplace=True)
df1.columns = ['Player', 'Team', 'Avg_Annual']
df1['Avg_Annual'] = df1['Avg_Annual'].replace(',', '')

print (df1.head(10))
Mark
  • 1,051
  • 3
  • 13
  • 17

6 Answers6

120

You have to access the str attribute per http://pandas.pydata.org/pandas-docs/stable/text.html

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '')
df1['Avg_Annual'] = df1['Avg_Annual'].str.replace('$', '')
df1['Avg_Annual'] = df1['Avg_Annual'].astype(int)

alternately;

df1['Avg_Annual'] = df1['Avg_Annual'].str.replace(',', '').str.replace('$', '').astype(int)

if you want to prioritize time spent typing over readability.

mechanical_meat
  • 163,903
  • 24
  • 228
  • 223
  • You might get a "ValueError: invalid literal for int() with base 10", to solve that error Just use float instead of int (astype(float)) for the strings that have floating point values. – majidshakeelshawl Aug 28 '23 at 15:21
28

Shamelessly stolen from this answer... but, that answer is only about changing one character and doesn't complete the coolness: since it takes a dictionary, you can replace any number of characters at once, as well as in any number of columns.

# if you want to operate on multiple columns, put them in a list like so:
cols = ['col1', 'col2', ..., 'colN']

# pass them to df.replace(), specifying each char and it's replacement:
df[cols] = df[cols].replace({'\$': '', ',': ''}, regex=True)

@shivsn caught that you need to use regex=True; you already knew about replace (but also didn't show trying to use it on multiple columns or both the dollar sign and comma simultaneously).

This answer is simply spelling out the details I found from others in one place for those like me (e.g. noobs to python an pandas). Hope it's helpful.

Hendy
  • 10,182
  • 15
  • 65
  • 71
5

@bernie's answer is spot on for your problem. Here's my take on the general problem of loading numerical data in pandas.

Often the source of the data is reports generated for direct consumption. Hence the presence of extra formatting like %, thousand's separator, currency symbols etc. All of these are useful for reading but causes problems for the default parser. My solution is to typecast the column to string, replace these symbols one by one then cast it back to appropriate numerical formats. Having a boilerplate function which retains only [0-9.] is tempting but causes problems where the thousand's separator and decimal gets swapped, also in case of scientific notation. Here's my code which I wrap into a function and apply as needed.

df[col] = df[col].astype(str)  # cast to string

# all the string surgery goes in here
df[col] = df[col].replace('$', '')
df[col] = df[col].replace(',', '')  # assuming ',' is the thousand's separator in your locale
df[col] = df[col].replace('%', '')

df[col] = df[col].astype(float)  # cast back to appropriate type
BiGYaN
  • 6,974
  • 5
  • 30
  • 43
  • @FaheemMitha, you'll have to be a bit more specific with your use case if you expect some help – BiGYaN Mar 21 '19 at 23:24
2

This worked for me. Adding "|" means or :

df['Salary'].str.replace('\$|,','', regex=True)
Mnl
  • 787
  • 8
  • 9
0

I used this logic

df.col = df.col.apply(lambda x:x.replace('$','').replace(',',''))
demokritos
  • 1,416
  • 2
  • 12
  • 34
  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community May 23 '22 at 18:09
0

When I got to this problem, this was how I got out of it.

df['Salary'] = df['Salary'].str.replace("$",'').astype(float)
cottontail
  • 10,268
  • 18
  • 50
  • 51
kojo justine
  • 103
  • 1
  • 12