Pandas- Dividing a column by another column conditional on if values are greater than 0?

Question

I have a pandas dataframe that contains dates, items, and 2 values. All I'm looking to do is output another column that is the product of column A / column B if column B is greater than 0, and 0 if column B is equal to 0.

   date     item   A   B        C       
 1/1/2017   a      0   3             0  
 1/1/2017   b      2   0             0  
 1/1/2017   c      5   2           2.5  
 1/1/2017   d      4   1             4  
 1/1/2017   e      3   3             1  
 1/1/2017   f      0   4             0  
 1/2/2017   a      3   3             1  
 1/2/2017   b      2   2             1  
 1/2/2017   c      3   9   0.333333333  
 1/2/2017   d      4   0             0  
 1/2/2017   e      5   3   1.666666667  
 1/2/2017   f      3   0             0

this is the code I've written, but the kernel keeps dying (keep in mind this is just an example table, I have about 30,000 rows so nothing too crazy)

df['C'] = df.loc[df['B'] > 0, 'A'] / df['B'])

any idea on what's going on? Is something running infinitely that's causing it to crash? Thanks for the help.

Does it crash on a smaller dataframe? A-priori it sounds like a bug. Try to narrow down to the row / set of rows which causes the crash. — tmrlvi, Nov 21 '17 at 23:35

score 10 · Accepted Answer · edited Nov 27 '18 at 23:46

10

You get that using np.where

df['C'] = np.round(np.where(df['B'] > 0, df['A']/df['B'], 0), 1)

Or if you want to use loc

df.loc[df['B'] > 0, 'C'] = df['A']/df['B']

and then fillna(0)

edited Nov 27 '18 at 23:46

Unni

5,348
6
36
55

answered Nov 21 '17 at 23:36

Vaishali

37,545
5
58
86

Nice one. This would've been my first option. – cs95 Nov 21 '17 at 23:39
that's great- thanks for sharing! Maybe it's an error on my end, but I'm getting an issue that a value is trying to be set on a copy of a slice from a dataframe? I didn't have that issue prior so I'm wondering if this script is causing that. – Dick Thompson Nov 21 '17 at 23:55
@DickThompson, did you create this dataframe by taking a slice of a larger dataframe? – Vaishali Nov 22 '17 at 00:06
Check this link, it explains the issue really well. https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas – Vaishali Nov 22 '17 at 00:07
Yeah thanks for the help- I guess I am just confused. So what I'm also trying to do is to basically get a comparison of each item vs the entire group, so the values in one of the column is just a groupby of values of each item for a date subtracted by the value for that row. ie `df['A'] = df['B'].groupby['date'].transform('sum') - df['B']` so what would the correct way to do this? I'm a bit confused – Dick Thompson Nov 22 '17 at 00:24
Can you post the earlier code in the post? I am not able to understand the issue from the groupby – Vaishali Nov 22 '17 at 00:32
Ok, so when you say df['B'], you are just looking at the series (one single column) and hence saying df['B'].groupby('date') doesnt work as there is no date column in df['B']. You need df.groupby('date').B.transform('sum') – Vaishali Nov 22 '17 at 00:43
I tried that and it still doesn't work? still getting the same copy a slice error – Dick Thompson Nov 22 '17 at 15:21
I'm also confused because when I run the groupby I don't get that error, only when I'm running the division – Dick Thompson Nov 22 '17 at 15:21
Can you post a full traceback? It would be easier to troubleshoot – Vaishali Nov 22 '17 at 15:22
Post the original df, then grouping and division – Vaishali Nov 22 '17 at 15:22
sorry for the confusion- maybe it could also have something to do with another transformation I did? I tried to make 2 copies of a column by literally just doing `df['A rolling'] = df['A rolling'].rolling(10).mean()` I'm confused though because whenever I run any of these functions I don't get this error, only when I try dividing the columns... – Dick Thompson Nov 22 '17 at 15:31

cs95 · Answer 2 · 2017-11-21T23:45:32.967

Option 1
You use pd.Series.mask to hide zeros, and then just empty cells with fillna.

v = (df.A / df.B.mask(df.B == 0)).fillna(0)
v

0     0.000000
1     0.000000
2     2.500000
3     4.000000
4     1.000000
5     0.000000
6     1.000000
7     1.000000
8     0.333333
9     0.000000
10    1.666667
11    0.000000
dtype: float64

df['C'] = v

Alternatively, replace those zeros with np.inf, because x / inf = 0.

df['C'] = (df.A / df.B.mask(df.B == 0, np.inf))

Option 2
Direct replacement with df.replace

df.A / df.B.replace(0, np.inf)

0     0.000000
1     0.000000
2     2.500000
3     4.000000
4     1.000000
5     0.000000
6     1.000000
7     1.000000
8     0.333333
9     0.000000
10    1.666667
11    0.000000
dtype: float64

Keep in mind, you can do an astype conversion, if you want mixed integers and floats as your result:

df.A.div(df.B.replace(0, np.inf)).astype(object)

0            0
1            0
2          2.5
3            4
4            1
5            0
6            1
7            1
8     0.333333
9            0
10     1.66667
11           0
dtype: object

I am about ...post the option 2 :-) haha – BENY Nov 21 '17 at 23:43 — BENY, Nov 21 '17 at 23:43
@Wen haha, I can remove it if you wish, you can post it. – cs95 Nov 21 '17 at 23:44 — cs95, Nov 21 '17 at 23:44
No need , I mean post in your comment .:-) not a answer – BENY Nov 21 '17 at 23:45 — BENY, Nov 21 '17 at 23:45

Pandas- Dividing a column by another column conditional on if values are greater than 0?

2 Answers2

Linked

Related