4

I have a dataframe that looks like that:

table = pd.DataFrame({'a':[0,0,0,0],
                      'b':[1,1,1,3,],
                      'c':[2,2,5,4],
                      'd':[3,np.NaN,6,6],
                      'e':[4,np.NaN, 7,8],
                      'f':[np.NaN,np.NaN,np.NaN,10,]}, dtype='float64')


    a   b   c   d   e   f
0   0.0 1.0 2.0 3.0 4.0 NaN
1   0.0 1.0 2.0 NaN NaN NaN
2   0.0 1.0 5.0 6.0 7.0 NaN
3   0.0 3.0 4.0 6.0 8.0 10.0

For each row, I'm trying to find the index of the column for the first NaN value. So that I can store that value in a variable to use it later.

So far, I tried this piece of code but it's not giving me exactly what I want.. I don't want an array, just a value.

for i in table.itertuples():
    x = np.where(np.isnan(i))
    print(x)

(array([6]),)
(array([4, 5, 6]),)
(array([6]),)
(array([], dtype=int64),)

Thanks in advance for any comment/advice !

Florian Bernard
  • 323
  • 2
  • 17

3 Answers3

4

Check na, get the index of max value by row and screen out rows that don't have na at all.

table.isna().idxmax(1).where(table.isna().any(1))

#0      f
#1      d
#2      f
#3    NaN
#dtype: object

Or if you need the column indices, as commented by @hpaulj, you can use argmax:

import numpy as np
is_missing = table.isna().values
np.where(is_missing.any(1), is_missing.argmax(1), np.nan)

# array([ 5.,  3.,  5., nan])
Psidom
  • 209,562
  • 33
  • 339
  • 356
  • Thanks for the help but I don't want to return an array. I just want the value as I iterate over each row. Maybe you can look up my comment on @jezrael answer for more details. – Florian Bernard Jun 09 '18 at 16:02
3

Use:

t = np.isnan(table.values).argmax(axis=1)
print (t)
[5 3 5 0]

But if need add one value for non NaNs rows:

t = np.isnan(table.reset_index().values).argmax(axis=1)
print (t)
[6 4 6 0]
jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
  • Thanks but I don't really want to store the value in the table. I just need to get the value for each row so that I can use it later with np.isclose(). I want to pass an array to np.isclose() but just for column containing a value that is not null. For the first row I would pass [1:5] because the 6th column is NaN, for the second row I would then pass [1:3] because the first NaN is in the 4th column, etc.. So if I store the first NaN value in 'x' I can then pass the array [1:6-x] to np.isclose() – Florian Bernard Jun 09 '18 at 15:59
  • Thanks for your help and your time but I managed to got what i want ! See my answer :) – Florian Bernard Jun 09 '18 at 16:13
  • @FlorianBernard - I am a bit confused, because python count from `0`, so need add 1 for NaN row? – jezrael Jun 09 '18 at 16:21
  • I honestly don't know why it behave like that.. maybe i made a mistake somewhere but I can still 'fix' it by substracting 1 to the value I get, if needed.. – Florian Bernard Jun 09 '18 at 16:28
  • @FlorianBernard - Your code working with `index` values also, check it by `print(i)`, similar behaviour is if reset_index() in my answer :) – jezrael Jun 09 '18 at 16:30
0

I obtained what I want by tweaking my bit of code and using argmax() as mentioned by @hpaulj :

for i in table.itertuples():
    x = np.isnan(i).argmax(axis=0)
    print(x)

#6
#4
#6
#0

Thanks to anyone for your help !

Florian Bernard
  • 323
  • 2
  • 17