53

Is there a way to convert values like '34%' directly to int or float when using read_csv in pandas? I want '34%' to be directly read as 0.34

  1. Using this in read_csv did not work:

    read_csv(..., dtype={'col':np.float})

  2. After loading the csv as 'df' this also did not work with the error "invalid literal for float(): 34%"

    df['col'] = df['col'].astype(float)

  3. I ended up using this which works but is long winded:

    df['col'] = df['col'].apply(lambda x: np.nan if x in ['-'] else x[:-1]).astype(float)/100

smci
  • 32,567
  • 20
  • 113
  • 146
KieranPC
  • 8,525
  • 7
  • 22
  • 25
  • 1) and 2) fail because '34%' has to be a string, trying to coerce to `float('34%')` throws ValueError. First you must do .rstrip('%') on the offending percent sign. – smci Feb 13 '22 at 00:11

2 Answers2

89

You were very close with your df attempt. Try changing:

df['col'] = df['col'].astype(float)

to:

df['col'] = df['col'].str.rstrip('%').astype('float') / 100.0
#                     ^ use str funcs to elim '%'     ^ divide by 100
# could also be:     .str[:-1].astype(...

Pandas supports Python's string processing functions on string columns. Just precede the string function you want with .str and see if it does what you need. (This includes string slicing, too, of course.)

Above we utilize .str.rstrip() to get rid of the trailing percent sign, then we divide the array in its entirety by 100.0 to convert from percentage to actual value. For example, 45% is equivalent to 0.45.

Although .str.rstrip('%') could also just be .str[:-1], I prefer to explicitly remove the '%' rather than blindly removing the last char, just in case...

smci
  • 32,567
  • 20
  • 113
  • 146
GaryMBloom
  • 5,350
  • 1
  • 24
  • 32
  • I don't believe you need "100.0"; just "100" should do fine. It's already float64 by then. – Asclepius Sep 23 '18 at 02:30
  • This seems like the better answer? @EdChum do you agree? – Umar.H Oct 30 '18 at 10:42
  • 2
    if the column has a mixture of strings with % and floats converted to pandas object, the above will need to be changed to: pct = df['col'].str.contains('%') df.loc[pct, 'col'] = df.loc[pct, 'col'].str.rstrip('%').astype('float') / 100.0 df['col'] = df['col'].astype(float) to prevent floats divided by 100 – gregV Jan 22 '19 at 15:52
  • nice and simple. THnx. I prefer this to the accepted answer that uses a function. – GenDemo May 29 '23 at 02:40
59

You can define a custom function to convert your percents to floats at read_csv() time:

# dummy data
temp1 = """index col 
113 34%
122 50%
123 32%
301 12%"""

# Custom function taken from https://stackoverflow.com/questions/12432663/what-is-a-clean-way-to-convert-a-string-percent-to-a-float
def p2f(x):
    return float(x.strip('%'))/100

# Pass to `converters` param as a dict...
df = pd.read_csv(io.StringIO(temp1), sep='\s+',index_col=[0], converters={'col':p2f})
df

        col
index      
113    0.34
122    0.50
123    0.32
301    0.12

# Check that dtypes really are floats
df.dtypes

col    float64
dtype: object

My percent to float code is courtesy of ashwini's answer: What is a clean way to convert a string percent to a float?

smci
  • 32,567
  • 20
  • 113
  • 146
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • 1
    This is the best approach, doing it at `read_csv()` time. But a note on `p2f()`, if you do the conversion later, doing `df['col'].apply(p2f, axis=1)` is less efficient than vectorized using the `.str.rstrip` accessor. – smci Feb 13 '22 at 00:16