0

I have a database that contains 37000 rows and 100 features.

The DB has a lot of Nan values.

But the big problem is that if feature A occurs with 600 rows and feature B occurs with 700 rows, the common rows between them may be only 10 rows.

I wanna find the max number of features that occurs with N rows at least

Example: Input Database

enter image description here

out: where N=5

enter image description here

EWF
  • 3
  • 1
  • Use `df = df.dropna().dropna(axis=1)` – jezrael Nov 18 '20 at 11:36
  • If need filter then `N` rows use `df = df.dropna().dropna(axis=1).head(N)` – jezrael Nov 18 '20 at 11:37
  • In fact this is not the solution for my problem. Because all rows contain nan values and all columns contain nan values. So if I used dropna, I will get an empty dataframe – EWF Nov 18 '20 at 12:00
  • [Please don't post images of code/data (or links to them)](http://meta.stackoverflow.com/questions/285551/why-may-i-not-upload-images-of-code-on-so-when-asking-a-question) – jezrael Nov 18 '20 at 12:02
  • Hi. Please take the time to read this post on [how to provide a great pandas example](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) as well as how to provide a [minimal, complete, and verifiable example](http://stackoverflow.com/help/mcve) and revise your question accordingly. These tips on [how to ask a good question](http://stackoverflow.com/help/how-to-ask) may also be useful. – jezrael Nov 18 '20 at 12:02

0 Answers0