Find index of the first NaN value in the row

Question

I have a dataframe that looks like that:

table = pd.DataFrame({'a':[0,0,0,0],
                      'b':[1,1,1,3,],
                      'c':[2,2,5,4],
                      'd':[3,np.NaN,6,6],
                      'e':[4,np.NaN, 7,8],
                      'f':[np.NaN,np.NaN,np.NaN,10,]}, dtype='float64')


    a   b   c   d   e   f
0   0.0 1.0 2.0 3.0 4.0 NaN
1   0.0 1.0 2.0 NaN NaN NaN
2   0.0 1.0 5.0 6.0 7.0 NaN
3   0.0 3.0 4.0 6.0 8.0 10.0

For each row, I'm trying to find the index of the column for the first NaN value. So that I can store that value in a variable to use it later.

So far, I tried this piece of code but it's not giving me exactly what I want.. I don't want an array, just a value.

for i in table.itertuples():
    x = np.where(np.isnan(i))
    print(x)

(array([6]),)
(array([4, 5, 6]),)
(array([6]),)
(array([], dtype=int64),)

Thanks in advance for any comment/advice !

See also https://stackoverflow.com/q/41320568/5085211 which is the same question for NumPy arrays. — fuglede, Jun 09 '18 at 16:34

Psidom · Answer 1 · 2018-06-09T15:51:11.207

4

Check na, get the index of max value by row and screen out rows that don't have na at all.

table.isna().idxmax(1).where(table.isna().any(1))

#0      f
#1      d
#2      f
#3    NaN
#dtype: object

Or if you need the column indices, as commented by @hpaulj, you can use argmax:

import numpy as np
is_missing = table.isna().values
np.where(is_missing.any(1), is_missing.argmax(1), np.nan)

# array([ 5.,  3.,  5., nan])

edited Jun 09 '18 at 15:51

answered Jun 09 '18 at 15:46

Psidom

209,562
33
339
356

Thanks for the help but I don't want to return an array. I just want the value as I iterate over each row. Maybe you can look up my comment on @jezrael answer for more details. – Florian Bernard Jun 09 '18 at 16:02

jezrael · Answer 2 · 2018-06-09T16:30:04.040

3

Use:

t = np.isnan(table.values).argmax(axis=1)
print (t)
[5 3 5 0]

But if need add one value for non NaNs rows:

t = np.isnan(table.reset_index().values).argmax(axis=1)
print (t)
[6 4 6 0]

edited Jun 09 '18 at 16:30

answered Jun 09 '18 at 15:48

jezrael

822,522
95
1,334
1,252

Thanks but I don't really want to store the value in the table. I just need to get the value for each row so that I can use it later with np.isclose(). I want to pass an array to np.isclose() but just for column containing a value that is not null. For the first row I would pass [1:5] because the 6th column is NaN, for the second row I would then pass [1:3] because the first NaN is in the 4th column, etc.. So if I store the first NaN value in 'x' I can then pass the array [1:6-x] to np.isclose() – Florian Bernard Jun 09 '18 at 15:59
Thanks for your help and your time but I managed to got what i want ! See my answer :) – Florian Bernard Jun 09 '18 at 16:13
@FlorianBernard - I am a bit confused, because python count from `0`, so need add 1 for NaN row? – jezrael Jun 09 '18 at 16:21
I honestly don't know why it behave like that.. maybe i made a mistake somewhere but I can still 'fix' it by substracting 1 to the value I get, if needed.. – Florian Bernard Jun 09 '18 at 16:28
@FlorianBernard - Your code working with `index` values also, check it by `print(i)`, similar behaviour is if reset_index() in my answer :) – jezrael Jun 09 '18 at 16:30

score 0 · Accepted Answer · answered Jun 09 '18 at 16:12

0

I obtained what I want by tweaking my bit of code and using argmax() as mentioned by @hpaulj :

for i in table.itertuples():
    x = np.isnan(i).argmax(axis=0)
    print(x)

#6
#4
#6
#0

Thanks to anyone for your help !

answered Jun 09 '18 at 16:12

Florian Bernard

323
2
17

Find index of the first NaN value in the row

3 Answers3