Pandas dataframe get value of last nonzero column

Question

I have a pandas dataframe which contains 3 columns, each containing a site that a user has visited during a session.

In some cases, a user may have not visited 3 sites in a single session. This is shown by a 0, denoting that no site has been visited.

import pandas as pd

df = pd.DataFrame(data=[[5, 8, 1],[8,0,0],[1,17,0]], 
                  columns=['site1', 'site2', 'site3'])
print(df)

   site1  site2  site3
0      5      8      1
1      8      0      0
2      1     17      0

In the example above, user 0 has visited sites 5, 8 and 1. User 1 has visited site 8 only, and user 2 has visited sites 1 and 17.

I would like to create a new column, last_site, which shows the last site visited by the user in that session.

The result I want is this:

   site1  site2  site3  last_site
0      5      8      1          1
1      8      0      0          8
2      1     17      0         17

How can I do this in a concise way using pandas?

jezrael · Accepted Answer · 2018-11-03T16:37:15.737

16

Use forward filling of misisng values created by replacing 0 values and thenselect last column by iloc:

df['last'] = df.replace(0, np.nan).ffill(axis=1).iloc[:, -1].astype(int)
print (df)
   site1  site2  site3  last
0      5      8      1     1
1      8      0      0     8
2      1     17      0    17

If performance is important is possible use numpy:

a = df.values
m = a != 0

df['last'] = a[np.arange(m.shape[0]), m.shape[1]-m[:,::-1].argmax(1)-1]
print (df)
   site1  site2  site3  last
0      5      8      1     1
1      8      0      0     8
2      1     17      0    17

edited Nov 03 '18 at 16:37

answered Nov 03 '18 at 16:21

jezrael

822,522
95
1,334
1,252

3

This forward filling logic here is excellent across the rows :) +1 – Karn Kumar Nov 03 '18 at 16:27
Yes forward filling across rows is out of the box thinking – Vishnudev Krishnadas Nov 03 '18 at 16:30
Great and Instant logic indeed :-) . – Karn Kumar Nov 03 '18 at 16:31

Vishnudev Krishnadas · Answer 2 · 2023-04-23T06:40:01.970

11

A simple approach would be to find the last zero value for each row

Code:

df['last_site'] = df.apply(lambda x: x.iloc[x.nonzero()].iloc[-1], axis=1)

Output:

   site1  site2  site3  last_site
0      5      8      1          1
1      8      0      0          8
2      1     17      0         17

edited Apr 23 '23 at 06:40

answered Nov 03 '18 at 16:26

Vishnudev Krishnadas

10,679
2
23
55

score 3 · Answer 3 · answered Nov 03 '18 at 17:24

`mask` + `ffill`

A "pure Pandas" solution:

df['last'] = df.mask(df.eq(0)).ffill(1).iloc[:, -1].astype(int)

`numba`

For efficiency over a large number of rows / columns, numba can help. To see why this works better than argmax, see Efficiently return the index of the first value satisfying condition in array.

from numba import njit

@njit
def get_last_val(A):
    m, n = A.shape
    res = A[:, -1]
    for i in range(m):
        for j in range(n):
            if A[i, j] == 0:
                res[i] = A[i, max(0, j-1)]
                break
    return res

df['last'] = get_last_val(df.values)

Pandas dataframe get value of last nonzero column

3 Answers3

mask + ffill

numba

`mask` + `ffill`

`numba`