I have a dataframe of 192 columns x 80000 values. However, some of the columns have NaN (Not a number)
and NaT (Not a Time)
. How to find the location of their first occurance?> I tried my way as given below:
import pandas as pd
import matplotlib.pyplot as plt
#df has 192 columns and each column has 80000 values
for i,j in zip(df.columns[::2],df.columns[1::2]):
print(df[(str(df[i])=='NaT')].index,df[(str(df[j])=='NaN')].index)
Output is:
KeyError: False
During handling of the above exception, another exception occurred:
For following modified code:
print(df[(df[i]=='NaT')].index,df[(df[j]=='NaN')].index)
I got output as:
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
Int64Index([], dtype='int64') Int64Index([], dtype='int64')
.
.
What is the mistake? why values did not appear here?
As per @ChrisA answer:
print(df[i].isna().idxmax(),df[j].isna().idxmax())
Output is:
83912 83912
83451 83451
83681 83681
83697 83697
83873 83873
83660 83660
82975 82975
83847 83847
0 0
83537 83537
83762 83762
.
Why is that it returned 0 for some columns in middle? Does it mean, first sample of those two columns is null? But actually those two columns have all values.