0

I have a dataframe of 192 columns x 80000 values. However, some of the columns have NaN (Not a number) and NaT (Not a Time). How to find the location of their first occurance?> I tried my way as given below:

import pandas as pd
import matplotlib.pyplot as plt

#df has 192 columns and each column has 80000 values
for i,j in zip(df.columns[::2],df.columns[1::2]):
    print(df[(str(df[i])=='NaT')].index,df[(str(df[j])=='NaN')].index) 

Output is:

KeyError: False

During handling of the above exception, another exception occurred:

For following modified code:

print(df[(df[i]=='NaT')].index,df[(df[j]=='NaN')].index)

I got output as:

Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
.
.

What is the mistake? why values did not appear here?

As per @ChrisA answer:

print(df[i].isna().idxmax(),df[j].isna().idxmax())

Output is:

83912 83912
83451 83451
83681 83681
83697 83697
83873 83873
83660 83660
82975 82975
83847 83847
0 0
83537 83537
83762 83762
.

Why is that it returned 0 for some columns in middle? Does it mean, first sample of those two columns is null? But actually those two columns have all values.

Msquare
  • 353
  • 1
  • 7
  • 17
  • Use `pd.Series.isnull` as per marked duplicates, no string conversion required. – jpp Aug 28 '18 at 10:10
  • 2
    Use `df.isna().idxmax()` will give you the index of the first NaN for every column – Chris Adams Aug 28 '18 at 10:11
  • 1
    @ChrisA, Thanks. Your method did work. But, did see the output ? Why is that it returned 0 for some columns in middle? Does it mean, first sample of those two columns is null? But actually those two columns have all values. – Msquare Aug 28 '18 at 10:55
  • @Msquare good point. You could try: `df.isna().idxmax() * np.where(df.isna().any(), 1, np.nan)` 0 should now mean NaN at index 0 - NaN means no NaN values – Chris Adams Aug 28 '18 at 11:01

0 Answers0