1
import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d= df.loc[df.isnull().sum(axis=1)>5]
d.dropna(axis=0,inplace=True)
print(round(100*(1-df.count()/len(df)),2))

i m getting output as

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    1.30

dtype: float64

but the output is

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06

dtype: float64
SHR
  • 7,940
  • 9
  • 38
  • 57
Puneet batra
  • 11
  • 1
  • 3
  • can you create a small example to replicate the issue? Right now its not very clear(IMO) as to what exactly you're trying to achieve. Check [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – anky Mar 17 '19 at 14:19

7 Answers7

3

Try this way:

df.drop(df[df.isnull().sum(axis=1)>5].index,axis=0,inplace=True)

print(round(100*(1-df.count()/len(df)),2))
saeed foroughi
  • 1,662
  • 1
  • 13
  • 25
1

I think you are trying to find the index of rows with null values sum greater 5. Use np.where instead of df.loc to find the index and then drop them.

Try:

import pandas as pd
import numpy as np
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d = np.where(df.isnull().sum(axis=1)>5)
df= df.drop(df.index[d])
print(round(100*(1-df.count()/len(df)),2))

output:

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06
dtype: float64
0

Try this, it should work

df = df[df.isnull().sum(axis=1) <= 5]
print(round(100*(1-df.count()/len(df)),2))
Juan Diego Lozano
  • 989
  • 2
  • 18
  • 30
0

Try this solution


import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*(df.isnull().sum()/len(df.index)),2))
0

This Should work.

df = df.drop(df[df.isnull().sum(axis=1) > 5].index)

print(round(100 * (df.isnull().sum() / len(df.index)), 2))
Josef
  • 2,869
  • 2
  • 22
  • 23
0
{marks = marks[marks.isnull().sum(axis=1) < 5]
print(marks.isna().sum())}

Please try these this will help

Athul Das
  • 1
  • 1
  • Welcome to Stack Overflow. Please test your solutions before posting and make sure they address the question and provide an explanation. Your solution does not print the percentages. – AlexK Oct 19 '22 at 20:36
0

This works:

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<5]
print(df.isnull().sum())