1

I'm modifying a csv file using Python Pandas. I am fairly new to this and am experimenting Pandas as an alternative for Excel regarding data handling and manipulation.

Now I run into a problem trying to conditionally change the value of a cell in column df.duration based upon the value of a cell on the same row in column df.paymenttype.

So I've tried modifying the value in df.duration using the .loc method.

df.loc[df.paymenttype == 'cash', df.duration] = (df.duration % 1)

It gives the expected outcome and works fine. However, in this case the outcome of df.duration % 1 returns an unwanted value 0.0 for certain rows. It is mathematically correct but in case df.duration % 1 returns 0.0 I want to set the value of df.duration to 1.

So I thought I might be able to do something like this:

df.loc[df.paymenttype == 'cash', df.duration] = 1 if df.duration % 1 == 0 else (df.duration % 1)

This however returns: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Now I am wondering two things:

  1. Why is this ValueError raised and how could I fix this?

I could and should be doing more research on this myself before dropping this question here and I will. But more importantly and for future projects (since I am fairly new to Python and Pandas):

  1. I am now wondering whether the .loc method is the right way to conditionally change the values for column cells in general and in this certain case where I want to add a conditional statement when setting the value.
Freeman84
  • 77
  • 1
  • 8
  • 2
    To your first question, `df.duration` is a Series. How would you, for example, interpret `if [0, 1, 2] == 1`? `loc` is a reasonable way forward, but the right hand side of the expression isn't necessarily keeping up with the row-wise operation on the left. – roganjosh Dec 29 '18 at 19:56
  • Possible duplicate of [Pandas conditional creation of a series/dataframe column](https://stackoverflow.com/questions/19913659/pandas-conditional-creation-of-a-series-dataframe-column) – gosuto Dec 29 '18 at 20:01

2 Answers2

1

There is nothing wrong with your initial broadcast using .loc; it worked perfectly. However if the conditions start getting more complex, you might want to take a look at pd.where() or np.select().

Also see Pandas conditional creation of a series/dataframe column.

As for your problem at hand: why not use df['duration'].replace(0.0, 1)?

gosuto
  • 5,422
  • 6
  • 36
  • 57
  • Thank you. This was exactly what I was looking for. Not just an answer on the problem at hand but also some guidance on how to go forward manipulating data using Pandas. Now not sure whether it is a duplicate question to _Pandas conditional creation of a series/dataframe column_ – Freeman84 Dec 29 '18 at 20:29
1

I would suggest you to use dataframe .apply method. In your case:

def my_func(x):
    if x%1 == 0:
        return 1.0
    else:
        return x%1

df['duration'][df['paymenttype']=='cash'] = df['duration'][df['paymenttype']=='cash'].apply(my_func)

And one more suggestion is to use df['column_name'] instead of df.column_name. Because sometimes there could be a space in column name.

Good luck with learning Pandas!

Dilshat
  • 1,088
  • 10
  • 12