2

I am new to python and working on string manipulation

I have a dataframe

df['Installs']
Out[22]: 
0           10,000+
1          500,000+
2        5,000,000+
3       50,000,000+
4          100,000+
5           50,000+

How do I remove the "+" and convert the string in the df to float?

My input:

df['Installs'] = df['Installs'].str.replace('+','',regex=True).astype(float)

However I get an error:

ValueError: could not convert string to float: '10,000'

How can I edit my code such that I get 10,000.0 as my output and so on for the other values instead of 10,000+

  • Not an exact duplicate, but you should find https://stackoverflow.com/questions/1779288/how-to-convert-a-string-to-a-number-if-it-has-commas-in-it-as-thousands-separato helpful. – Karl Knechtel Apr 18 '19 at 12:32
  • Also, try to remove comma before calling `astype(float)` – hacker315 Apr 18 '19 at 12:53

1 Answers1

1

Use Series.str.replace with , and + to empty string:

df['Installs'] = df['Installs'].str.replace('[,+]','').astype(float)
#alternative
#df['Installs'] = df['Installs'].replace('[,+]','', regex=True).astype(float)
print (df)
     Installs
0     10000.0
1    500000.0
2   5000000.0
3  50000000.0
4    100000.0
5     50000.0
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252