Python iterate over df and refer to next row

Question

I have a dataframe with two columns - Col1 and Col2. I want to iterate through it and return only the values that follow the pattern:

if A followed by B, return AB, else return nothing. I want to basically get rid of all A's that are not immediatelly followed by B's. Note that B's are in my real example always preceded by A.

df = pd.DataFrame({"Col1": ['A', 'B', 'A', 'A', 'B'],
                   "Col2": [11, 22, 33, 44, 55]})

  Col1  Col2
0    A    11
1    B    22
2    A    33
3    A    44
4    B    55

I came up with this code:

for index, row in df.iterrows():
    if row['Col1'] == 'A':
        if row.shift(1)['Col1'] == 'B':
            print(row)
            print(row.shift(1))
        else:
            continue
    else:
        continue

I did just the "print function" for now to see if the correct entry prints but I'll be glad if anyone helps me with a way to create a new DF with these rows.

The code above returns nothing, but also no error.

Edit: example desired output

  Col1  Col2
0    A    11
1    B    22
3    A    44
4    B    55

score 1 · Accepted Answer · answered Oct 28 '20 at 11:42

1

Let's create a boolean mask with eq + shift representing the condition where A is followed by B, then use this mask to filter the dataframe:

m1 = df['Col1'].eq('B') & df['Col1'].shift().eq('A')
m2 = df['Col1'].eq('A') & df['Col1'].shift(-1).eq('B')
df = df[m1 | m2]

  Col1  Col2
0    A    11
1    B    22
3    A    44
4    B    55

answered Oct 28 '20 at 11:42

Shubham Sharma

68,127
6
24
53

1

Thank you, this is even better than my friend's code I listed below! – JachymDvorak Oct 28 '20 at 11:51
@JachymDvorak Technically they are same. This one is just more concise ;) – Shubham Sharma Oct 28 '20 at 11:54

score 0 · Answer 2 · answered Oct 28 '20 at 10:43

0

Can you add an example of the output you desire?

Also, it's generally considered bad practise to create 'for' loops in a pandas dataframe (Takes a lot of time too!). I believe the answer to your question lies in this article:

How to iterate over rows in a DataFrame in Pandas

answered Oct 28 '20 at 10:43

Ventoii

46
7

Thanks for the comment, I included an example in my original post! I already read the thread you linked, but I can't seem to get anything that works. All methods (apply, vectorized) seem to be unable to refer to the next row for my condition. I tried apply but with no luck. – JachymDvorak Oct 28 '20 at 11:07

score 0 · Answer 3 · answered Oct 28 '20 at 11:46

My friend helped me and found a solution, so I am sharing for the ones that stumble accross this.

df["Col3"] = df["Col1"].shift(-1)
df["Col4"] = df["Col1"].shift(1)

df_new = df[((df["Col1"] == "A") & (df["Col3"] == "B")) | ((df["Col1"] == "B") & (df["Col4"] == "A"))]
df_new = df_new[["Col1", "Col2"]]

df_new

Python iterate over df and refer to next row

3 Answers3