1

I have following dataframe in pandas

 ID      Quantity       Rate       Product
 1       10             70         MS
 2       10             70         MS
 3       100            70         MS
 4       10             100        MS
 5       700            65         HS
 6       1100           65         HS
 7       700            100        HS

I want to cap values with mean values in Quantity and Rate For MS if Quantity is greater than 100 and Rate is greater than 99 then it should be replaced by mean and For HS if Quantity is greater than 1000 and Rate is greater than 99 then it should be replaced by mean.

I am using following way

 mean_MS = df['Quantity'][(df['Product'] == 'MS') and (df['Quantity'] < 100)].mean()

But it does not work.

My desired dataframe would be

 ID      Quantity       Rate       Product
 1       10             70         MS
 2       10             70         MS
 3       10             70         MS
 4       10             70         MS
 5       700            65         HS
 6       700            65         HS
 7       700            65         HS
Neil
  • 7,937
  • 22
  • 87
  • 145

2 Answers2

1

one way to solve this,

m1=df['Product']=='MS'
m2=(df['Quantity']>=100)|(df['Rate']>99)
df.loc[m1&m2,'Quantity']=df[m1&(df['Quantity']<100)]['Quantity'].mean()
df.loc[m1&m2,'Rate']=df[m1&(df['Rate']<99)]['Rate'].mean()

m3=df['Product']=='HS'
m4=(df['Quantity']>=1000)|(df['Rate']>99)
df.loc[m3&m4,'Quantity']=df[m3&(df['Quantity']<1000)]['Quantity'].mean()
df.loc[m3&m4,'Rate']=df[m3&(df['Rate']<99)]['Rate'].mean()

O/P:

   ID  Quantity  Rate Product
0   1      10.0  70.0      MS
1   2      10.0  70.0      MS
2   3      10.0  70.0      MS
3   4      10.0  70.0      MS
4   5     700.0  65.0      HS
5   6     700.0  65.0      HS
6   7     700.0  65.0      HS

Explanation:

  1. divide your problem into two sub models one is MS and another one is HS for both contains same logic but differs in quantity value.

  2. first you have to change value only for MS so flag that in m1 then if Quantity is greater than or equal to 100 or Rate is greater than 99 replace the mean value from the df where df contains required MS row and clearing out the values where our condition exceeds.

  3. repeat the same logic for Rate.

  4. repeat step 2 and 3 for HS too where Quantity condition modified from 100 to 1000.

Mohamed Thasin ah
  • 10,754
  • 11
  • 52
  • 111
0

IIUC , you can also try the below:

val1= df.loc[df.Product.eq('MS'),['Quantity','Rate']].mode().values 
#array([[10, 70]], dtype=int64)
val2= df.loc[df.Product.eq('HS'),['Quantity','Rate']].mode().values
#array([[700,  65]], dtype=int64)

df.loc[df.Product.eq('MS')&df.Quantity.ge(100)|df.Product.eq('MS')&df.Rate.gt(99),['Quantity','Rate']] = val1

df.loc[df.Product.eq('HS')&df.Quantity.ge(1000)|df.Product.eq('HS')&df.Rate.gt(99),['Quantity','Rate']] = val2
print(df)

   ID  Quantity  Rate Product
0   1        10    70      MS
1   2        10    70      MS
2   3        10    70      MS
3   4        10    70      MS
4   5       700    65      HS
5   6       700    65      HS
6   7       700    65      HS
anky
  • 74,114
  • 11
  • 41
  • 70