Remove the missing values from the rows having greater than 5 missing values and then print the percentage of missing values in each column

Question

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d= df.loc[df.isnull().sum(axis=1)>5]
d.dropna(axis=0,inplace=True)
print(round(100*(1-df.count()/len(df)),2))

i m getting output as

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.24
Discount               0.65
Order_Quantity         0.65
Profit                 0.65
Shipping_Cost          0.65
Product_Base_Margin    1.30

dtype: float64

but the output is

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06

dtype: float64

can you create a small example to replicate the issue? Right now its not very clear(IMO) as to what exactly you're trying to achieve. Check [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) — anky, Mar 17 '19 at 14:19

score 3 · Answer 1 · edited Jan 27 '20 at 13:49

3

Try this way:

df.drop(df[df.isnull().sum(axis=1)>5].index,axis=0,inplace=True)

print(round(100*(1-df.count()/len(df)),2))

edited Jan 27 '20 at 13:49

saeed foroughi

1,662
1
13
25

answered Jan 27 '20 at 12:53

Lakshminarayana

31
3

instead of `print(round(100*(1-df.count()/len(df)),2))` use `df.isnull().sum()` – Manoj Kumar Mar 04 '21 at 16:00

Karthik K S · Answer 2 · 2019-03-17T14:59:34.937

I think you are trying to find the index of rows with null values sum greater 5. Use np.where instead of df.loc to find the index and then drop them.

Try:

import pandas as pd
import numpy as np
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
d = np.where(df.isnull().sum(axis=1)>5)
df= df.drop(df.index[d])
print(round(100*(1-df.count()/len(df)),2))

output:

Ord_id                 0.00
Prod_id                0.00
Ship_id                0.00
Cust_id                0.00
Sales                  0.00
Discount               0.42
Order_Quantity         0.42
Profit                 0.42
Shipping_Cost          0.42
Product_Base_Margin    1.06
dtype: float64

score 0 · Answer 3 · edited Jan 12 '21 at 05:52

0

Try this, it should work

df = df[df.isnull().sum(axis=1) <= 5]
print(round(100*(1-df.count()/len(df)),2))

edited Jan 12 '21 at 05:52

Juan Diego Lozano

989
2
18
30

answered Jan 11 '21 at 21:17

Rajat Rai

1

score 0 · Answer 4 · answered May 11 '21 at 15:47

0

Try this solution


import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<=5]
print(round(100*(df.isnull().sum()/len(df.index)),2))

answered May 11 '21 at 15:47

Sourabh Kulkarni

37
8

score 0 · Answer 5 · edited Jul 18 '21 at 08:13

0

This Should work.

df = df.drop(df[df.isnull().sum(axis=1) > 5].index)

print(round(100 * (df.isnull().sum() / len(df.index)), 2))

edited Jul 18 '21 at 08:13

Josef

2,869
2
22
23

answered Jul 17 '21 at 10:01

Sathya Gowri Chandraja Manda

1
1

Athul Das · Answer 6 · 2022-10-17T18:46:17.203

0

{marks = marks[marks.isnull().sum(axis=1) < 5]
print(marks.isna().sum())}

Please try these this will help

edited Oct 17 '22 at 18:46

answered Oct 17 '22 at 18:45

Athul Das

1
1

Welcome to Stack Overflow. Please test your solutions before posting and make sure they address the question and provide an explanation. Your solution does not print the percentages. – AlexK Oct 19 '22 at 20:36

score 0 · Answer 7 · answered Nov 16 '22 at 16:21

0

This works:

import pandas as pd
df = pd.read_csv('https://query.data.world/s/Hfu_PsEuD1Z_yJHmGaxWTxvkz7W_b0')
df = df[df.isnull().sum(axis=1)<5]
print(df.isnull().sum())

answered Nov 16 '22 at 16:21

Milon Priyaranjan

1

Remove the missing values from the rows having greater than 5 missing values and then print the percentage of missing values in each column

7 Answers7