how to find which rows have more than certain columns with values (e.g. 20 columns) in Panda?

Question

I have the e.g. [100,30] data frame, I would like to find which rows have more than 20 columns with values? Well all rows have 30 columns, but some of them have NaN values, therefore I set a limit of 20 columns and I wanna find which rows have values for the column after col.iloc = 20

For example, even though at row number 05 we have more Nan value, due to distribution I would like to find which rows have more than 3 columns either with the value of X or Nan (in the table below I want to find the index of rows number 1, 3, 4 and 7. because they have value for the columns after index.column=3, and my question is among the large data set how should I find them).

ID	Col.01	Col.02	Col.03	Col.04	Col.05
01	X	X	X	X	X
02	X	X	X	Nan	Nan
03	X	X	X	X	X
04	X	X	X	X	Nan
05	X	Nan	Nan	Nan	Nan
06	X	X	X	Nan	Nan
07	X	X	X	X	Nan

My expected result:

I find out that the ID of the row: 1,3,4,7 are the rows have values in column>3, so I can delete them after that

ID	Col.01	Col.02	Col.03	Col.04	Col.05
02	X	X	X	Nan	Nan
05	X	Nan	Nan	Nan	Nan
06	X	X	X	Nan	Nan

but in large dataframe It would not that easy

"*which rows have more than 3 columns either with the value of X or Nan*": what do you mean? What is the expected result? — Corralien, Jun 30 '22 at 10:16

score 0 · Answer 1 · answered Jun 30 '22 at 10:09

0

You can use

df.column.isna().sum()

to find the number of NaN values in a column. Refer to this answer.

answered Jun 30 '22 at 10:09

Quantum

510
1
2
19

No, maybe I wrote not explicitly, I wanna find which rows have more than 3 columns either with value X or NaN (NaN is not an issue here) – SHAH Jun 30 '22 at 10:13

Corralien · Answer 2 · 2022-06-30T11:07:36.267

0

Try

out = df[df.iloc[:, 4:].isna().all(axis=1)]
print(out)

# Output
   ID Col.01 Col.02 Col.03 Col.04 Col.05
1  02      X      X      X    NaN    NaN
4  05      X    NaN    NaN    NaN    NaN
5  06      X      X      X    NaN    NaN

If your columns have names, you can use df.loc[:, 'Col.04':] instead of df.iloc[:, 4:].

edited Jun 30 '22 at 11:07

answered Jun 30 '22 at 10:19

Corralien

109,409
8
28
52

I meant any values, NaN value is not an issue here, imagine the problem is out of 100 rows and 30 columns I want to remove some rows which have values in the index.column >20. for example in above table I want to remove the rows of 1, 3, 4, 7. Therefore first I have to find the index of rows – SHAH Jun 30 '22 at 11:01
@MohsenShahvar. I updated my answer. Can you check it please? – Corralien Jun 30 '22 at 11:07

ID	Col.01	Col.02	Col.03	Col.04	Col.05
01	X	X	X	X	X
02	X	X	X	Nan	Nan
03	X	X	X	X	X
04	X	X	X	X	Nan
05	X	Nan	Nan	Nan	Nan
06	X	X	X	Nan	Nan
07	X	X	X	X	Nan

ID	Col.01	Col.02	Col.03	Col.04	Col.05
01	X	X	X	X	X
02	X	X	X	Nan	Nan
03	X	X	X	X	X
04	X	X	X	X	Nan
05	X	Nan	Nan	Nan	Nan
06	X	X	X	Nan	Nan
07	X	X	X	X	Nan

how to find which rows have more than certain columns with values (e.g. 20 columns) in Panda?

2 Answers2

ID	Col.01	Col.02	Col.03	Col.04	Col.05
01	X	X	X	X	X
02	X	X	X	Nan	Nan
03	X	X	X	X	X
04	X	X	X	X	Nan
05	X	Nan	Nan	Nan	Nan
06	X	X	X	Nan	Nan
07	X	X	X	X	Nan