2

I have the e.g. [100,30] data frame, I would like to find which rows have more than 20 columns with values? Well all rows have 30 columns, but some of them have NaN values, therefore I set a limit of 20 columns and I wanna find which rows have values for the column after col.iloc = 20

For example, even though at row number 05 we have more Nan value, due to distribution I would like to find which rows have more than 3 columns either with the value of X or Nan (in the table below I want to find the index of rows number 1, 3, 4 and 7. because they have value for the columns after index.column=3, and my question is among the large data set how should I find them).

ID Col.01 Col.02 Col.03 Col.04 Col.05
01 X X X X X
02 X X X Nan Nan
03 X X X X X
04 X X X X Nan
05 X Nan Nan Nan Nan
06 X X X Nan Nan
07 X X X X Nan

My expected result:

I find out that the ID of the row: 1,3,4,7 are the rows have values in column>3, so I can delete them after that

ID Col.01 Col.02 Col.03 Col.04 Col.05
02 X X X Nan Nan
05 X Nan Nan Nan Nan
06 X X X Nan Nan

but in large dataframe It would not that easy

SHAH
  • 61
  • 5
  • "*which rows have more than 3 columns either with the value of X or Nan*": what do you mean? What is the expected result? – Corralien Jun 30 '22 at 10:16

2 Answers2

0

You can use

df.column.isna().sum() 

to find the number of NaN values in a column. Refer to this answer.

Quantum
  • 510
  • 1
  • 2
  • 19
  • No, maybe I wrote not explicitly, I wanna find which rows have more than 3 columns either with value X or NaN (NaN is not an issue here) – SHAH Jun 30 '22 at 10:13
0

Try

out = df[df.iloc[:, 4:].isna().all(axis=1)]
print(out)

# Output
   ID Col.01 Col.02 Col.03 Col.04 Col.05
1  02      X      X      X    NaN    NaN
4  05      X    NaN    NaN    NaN    NaN
5  06      X      X      X    NaN    NaN

If your columns have names, you can use df.loc[:, 'Col.04':] instead of df.iloc[:, 4:].

Corralien
  • 109,409
  • 8
  • 28
  • 52
  • I meant any values, NaN value is not an issue here, imagine the problem is out of 100 rows and 30 columns I want to remove some rows which have values in the index.column >20. for example in above table I want to remove the rows of 1, 3, 4, 7. Therefore first I have to find the index of rows – SHAH Jun 30 '22 at 11:01
  • @MohsenShahvar. I updated my answer. Can you check it please? – Corralien Jun 30 '22 at 11:07