Check pandas column for successive row values

Question

I have:

I have a list of lists and single integers like this:

[[2,8,3], 2, [2,8]]

For each item in the main list, I want to find out the index of when it appears in the column for the first time.

So for the single integers (i.e 2) I want to know the first time this appears in the hi column (index 1, but I am not interested when it appears again i.e index 6)

For the lists within the list, I want to know the last index of when the list appears in order in that column.

So for [2,8,3] that appears in order at indexes 6, 7 and 8, so I want 8 to be returned. Note that it appears before this too, but is interjected by a 4, so I am not interested in it.

I have so far used:

for c in chunks:

        # different method if single note chunk vs. multi

        if type(c) is int:
           # give first occurence of correct single notes
           single_notes = df1[df1['user_entry_note'] == c]
           single_notes_list.append(single_notes)

        # for multi chunks
        else:
            multi_chunk = df1['user_entry_note'].isin(c)
            multi_chunk_list.append(multi_chunk)

Fun looking question, but please review https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples to help us get you a solution. — Rich Andrews, Mar 27 '19 at 13:10
Added code of where I'm at. I'm stuck here. I am basically wondering if there is a function made for this sort of thing that I am unaware of. — syntheso, Mar 27 '19 at 13:13
No @syntheso. You will need to loop over the df column as a list an look for exact matches — yatu, Mar 27 '19 at 13:19

ALollz · Accepted Answer · 2019-03-29T14:27:30.913

You can do it with np.logical_and.reduce + shift. But there are a lot of edge cases to deal with:

import numpy as np

def find_idx(seq, df, col):
    if type(seq) != list:     # if not list
        s = df[col].eq(seq)
        if s.sum() >= 1:      # if something matched
            idx = s.idxmax().item()
        else:
            idx = np.NaN
    elif seq:                 # if a list that isn't empty
        seq = seq[::-1]       # to get last index
        m = np.logical_and.reduce([df[col].shift(i).eq(seq[i]) for i in range(len(seq))])
        s = df.loc[m]
        if not s.empty:       # if something matched
            idx = s.index[0]
        else:
            idx = np.NaN
    else:                     # empty list
        idx = np.NaN
    return idx

l = [[2,8,3], 2, [2,8]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7]

l = [[2,8,3], 2, [2,8], [], ['foo'], 'foo', [1,2,4,8,3,3]]
[find_idx(seq, df, col='hi') for seq in l]
#[8, 1, 7, nan, nan, nan, 5]

Check pandas column for successive row values

1 Answers1

Linked