1

I have a really awkward pandas DataFrame that looks kind of like this:

identifier    per_1       per_2       per_3       per_4       per_5
'something'   124/127     100/100     24/39       14/20       10/10
'camel'       121/122     150/206     300/307     11/12       0/2
 ...          ...         ...         ...         ...         ...

So, everything but the first column is a 'fraction' that's actually a string. I'd prefer them in decimal form. To access everything but the first column, I grab:

df.loc[:,df.columns != ('identifier')]

Which works fine. If I wanted to turn a single column into decimals, I could do:

df['per_1'] = df['per_1'].apply(lambda x: [float(n) for n in x.split('/')[0:2]])
df['per_1'] = df['per_1'].apply(lambda x: x[0] / x[1] if x[1] != 0 else np.nan)

I'd then have to iterate over every column that I want to do this for. This doesn't feel very pythonic to me, considering that I can actually grab every column that I want to do this for using df.loc[:,df.columns != ('identifier')]. Is there a better way to go about this?

John Rouhana
  • 538
  • 3
  • 15
  • 3
    You can use `df.filter(like='per')`. [pd.DataFrame.filter](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html#pandas-dataframe-filter) – Scott Boston Mar 21 '19 at 19:50
  • I need to clarify; the IDs aren't actually `('per_1', 'per_2', 'per_3', etc)`. I made a mistake of illustrating them here as though they had something in common in their names. The column names are actually more or less random. – John Rouhana Mar 22 '19 at 15:37

1 Answers1

3

Try the below code:

df[['identifier']].join(df.filter(like='per').apply(pd.eval))

    identifier     per_1     per_2     per_3     per_4 per_5
0  'something'  0.976378         1  0.615385       0.7     1
1      'camel'  0.991803  0.728155  0.977199  0.916667     0
anky
  • 74,114
  • 11
  • 41
  • 70
  • 1
    Maybe just `df.apply(pd.eval)`? (Not sure about the quotes in the first column though) – rafaelc Mar 21 '19 at 19:51
  • @RafaelC thanks :) throws me `'unknown type str288'` any idea? ok i think its the quotes – anky Mar 21 '19 at 19:52
  • 1
    Hm, weird.. can't reproduce. `df.apply(pd.eval)` works fine here! – rafaelc Mar 21 '19 at 19:53
  • 1
    @RafaelC i think the quotes is the culprit – anky Mar 21 '19 at 19:53
  • How exactly does this work? My column names, in practice, don't have the string 'per' in them, so I'm trying to brainstorm how to modify this to work for my use. – John Rouhana Mar 22 '19 at 15:38
  • Trying `df.loc[:,df.columns != ('identifier')].apply(pd.eval)` doesn't appear to work for me either. I get hit with `"'PandasExprVisitor' object has no attribute 'visit_Ellipsis'"`. No idea what that means. – John Rouhana Mar 22 '19 at 15:42
  • 1
    @JohnRouhana your code works for me. May be [this](https://stackoverflow.com/questions/48008191/attributeerror-pandasexprvisitor-object-has-no-attribute-visit-ellipsis-us) will help you – anky Mar 23 '19 at 05:12
  • 1
    So this works, but apparently Pandas has a bug in it; it only works up to 100 lines. I can break up what I have easily enough. Accepting answer. – John Rouhana Mar 25 '19 at 13:00