0

I need to assign a value to all registers assigned as 0 (zero) in a column ('A'). This new value will be the mean value of every registers that share the same value registered on another column ('B'), i.e.: all rows that have 'A' assigned as 0 will have their value replaced by the mean of 'A' found among those who share the same value on 'B'. Apparently, the following code is not working, because, when I call print(df.A) after it, I have some rows with 'A' as 0 returned:

df = df[df.A == 0].groupby('B')['A'].mean().reset_index()

I tried a bunch of line codes, but some aren't even accepted...

What I expect is a situation that all 0 values for A are replaced for a mean of A column grouped by B column. Like this:

Before:

Output:
     A    B
1    0    7    
2    0    7
3    9    7
4   10    6
5    8    6
6    0    6
7    0    2

After:

Output:

         A    B
    1    3    7    
    2    3    7
    3    9    7
    4   10    6
    5    8    6
    6    3    6
    7    0    2
 

Thank you for your support.

jaymzleutz
  • 155
  • 2
  • 10
  • Could you provide an example of what the dataframe looks like before and after? – footfalcon Aug 31 '20 at 23:37
  • Sorry, what I meant by my comment is to provide an initial sample dataset with required data to achieve what you are after, and show the same dataset as you intend it to look once finished (ie: not after what you have tried, but what you want it to look like). For example, if repeating values are important to the solution, you should include that in the initial data sample. This helps both make your question clear and the answers that people provide, as they can use the sample dataset and clearly show steps taken to match your desired output. – footfalcon Sep 01 '20 at 11:27
  • I got it.. But do you think it's necessary something more specific than what I have provided? I just need to get rid of 0 values on A , replacing it for the mean value of all the instances of A that share the same value on B. – jaymzleutz Sep 01 '20 at 12:06
  • So I purposely gave this some time to see if someone would answer. I am certain your question is not difficult to solve, but you are making it harder than it should be to answer, therefore you have none. I suggest you give a good read of this link to ensure you get the best experience possible from this site: https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples Only trying to help :) – footfalcon Sep 02 '20 at 00:21
  • Thank you very much for your help, @footfalcon. I think I got what you mean. I edited my question. Hope it fits. – jaymzleutz Sep 02 '20 at 14:52

1 Answers1

0

I think I understand your question now, but then I don't see how you got the '3' for row 6 col A. I am following the logic for how I was able to match the 3's in rows 1 and 2 in col A, which I will try to explain in code below. If this isn't quite the correct interpretation, hopefully is still gets you pointed in the right direction.

Your initial df

df = pd.DataFrame({
    'A': [0, 0, 9, 10, 8, 0, 0], 
    'B': [7, 7, 7, 6, 6, 6, 2]
    })

    A   B
1   0   7
2   0   7
3   9   7
4   10  6
5   8   6
6   0   6
7   0   2

Restating the objective:

For each unique value in col B where col A is 0, find the rows in col A where B has that value and take the mean of those col A values. Then overwrite that mean value to those rows in A that are 0 and line up with the value selected in B. So, for example, the first 3 rows there are 7's in col B, and 0, 0, 9 in col A. The mean of the first 3 A rows is 3, so that is the value that will get overwritten on the 0s in col A, rows 1 and 2.

Steps

Get the unique values from col B where col A is also 0

bvals_when_a_zero = df[df['A'] == 0]['B'].unique()
array([7, 6, 2])

For each of those unique values, calculate the mean of the corresponding values in col A

means = [df[df['B'] == i]['A'].mean() for i in bvals_when_a_zero]
[3.0, 6.0, 0.0]

Loop over the bvals,means pairs and overwrite the 0's with the corresponding mean for the bval. The logic of the pandas where method keeps the values stated to the left (in this case, df['A'] values) that meet the condition in the first argument in the brackets, otherwise choose the second argument as the value to keep. Our condition (df['A'] == 0) & (df['B'] == bval) says get the rows where col A is 0 and col B is one of the unique bvals. But here we actually want to keep the df['A'] values that are NOT equal to the condition, so the conditional argument in the brackets is negated with the ~ symbol in front.

for bval, mean in zip(bvals_when_a_zero, means):
    df['A'] = df['A'].where( ~((df['A'] == 0) & (df['B'] == bval)), mean )

This gives the final df

    A   B
1   3   7
2   3   7
3   9   7
4   10  6
5   8   6
6   6   6
7   0   2
footfalcon
  • 581
  • 5
  • 16