-1

Hello I have two ratio calculted dataframe. I am taking average of each rows accross column to get one value average when am doing like that both average proportion are not matching in dataframes

df1

  name      exm1    exm2    exm3     exm4
  student1  0.43    0.36      0         1
  student2     0       0      0      0.45
  student3     0       0      0         0
  student4  0.64       0   0.75       0.6
  student5  0.12    0.23   0.57         0
  student6     0       0      0       0.9
  student7     0       1      0       0.3
  student8  0.43    0.36      0      0.25
  student9     0       0      0      0.58
 student10     0       0      0         0

df2

    name        exm1    exm2    exm3    exm4
   student1     0.57    0.64       0       0
   student2        0       0       0    0.55
   student3        0       0       0       1
   student4     0.36       0    0.25     0.4
   student5     0.88    0.77    0.43       0
   student6        0       0       0     0.9
   student7        0       0       0     0.3
   student8        0       0       0    0.25
   student9        0       0       0    0.58
   student10       0       0       0       0

to calculate average I used below code

df1['average'] = df1.mean(numeric_only=True, axis=1)
df2['average'] = df2.mean(numeric_only=True, axis=1)

in both dataframe when comapare average along students average is not equal to 1 the main cause of this is 0 present in both cell. for example student1 in exm3 both its zero this type zeros making average inappropriate

  name     average_df1  average_df2  sum
 student1   0.4475      0.451875    0.899375
 student2   0.1125      0.140625    0.253125
 student3        0          0          0
 student4   0.4975      0.461875    0.959375
 student5   0.23        0.2575      0.4875
 student6   0.225       0.28125     0.50625
 student7   0.325       0.40625     0.73125
 student8   0.26        0.2175      0.4775
 student9   0.145       0.18125     0.32625
 student10     0           0          0

sum of average should be equal to one. how can I can do it in pandas any suggestions

2 Answers2

1

You are using zeros to represent missing data, while they can also represent a valid value.

First determine where missing values are, with e.g.:

missing = (df1==0) & (df2==0)

then compute your averages:

df1['average'] = df1[~missing].iloc[:, 1:].mean(axis=1)
df2['average'] = df2[~missing].iloc[:, 1:].mean(axis=1)

You should use the first column as an index:

df1 = df1.set_index['name']
df2 = df2.set_index['name']

That would be cleaner and you would avoid patchy clauses such as loc[:, 1:] or numeric_only=True to skip the column containing string values.

Chris
  • 387
  • 1
  • 8
0

The crux of the problem is to:

  1. .replace all 0's with np.nan, so they are not calculated in the average with the other dataframe.
  2. However, this means you have an undesired effect of replacing 0's that should be there if there is an accompanying 1 in the other dataframe. For these, you can loop through the columns and apply a vectorized np.where to replace NaN values with 0, when there is a 1 present in the same cell of the other dataframe.

code:

import numpy as np, pandas as pd
df1 = df1.replace(0,np.nan)
df2 = df2.replace(0,np.nan)
for col in df1.columns:
    df2[col] = np.where(df1[col] == 1, df2[col].replace(np.nan,0), df2[col])
    df1[col] = np.where(df2[col] == 1, df1[col].replace(np.nan,0), df1[col])
df = (df1.assign(average_df1=df1.mean(axis=1),
           average_df2=df2.mean(axis=1),
           sum_df1_df2=df1.mean(axis=1)+df2.mean(axis=1))
     .iloc[:,-3:])
df

output (the first 5 rows add up to 1 as expected -- I'm not sure if the data in the last 5 rows of your input is a mistake, but there is no way to get them to add up to one:

    average_df1 average_df2 sum_df1_df2
0   0.596667    0.403333    1.000000
1   0.450000    0.550000    1.000000
2   0.000000    1.000000    1.000000
3   0.663333    0.336667    1.000000
4   0.306667    0.693333    1.000000
5   0.900000    0.900000    1.800000
6   0.650000    0.150000    0.800000
7   0.346667    0.250000    0.596667
8   0.580000    0.580000    1.160000
9   NaN         NaN         NaN
David Erickson
  • 16,433
  • 2
  • 19
  • 35