1

I have a df like this:

df = pd.DataFrame([[1, 184], [1, 3], [4, 6], [2,183], [7,9], [0,7]], columns=['A', 'B'])
df

    A   B
0   1   184
1   1   3
2   4   6
3   2   183
4   7   9
5   0   7

I need to iterate through column 'B' and for every cell with a value between 182 and 186, I need to store the value from two cells below that into a variable 'marker'.

I tried:

for val in df['B']:
    if 182 < int(val) < 186:
        print(val)        
        marker = df['B'].shift(-2).values[0]
        print(marker)

And I get:

184
6.0
183
6.0

But I need:

184
6.0
183
7.0

I would love to hear suggestions for fixing this.

arkadiy
  • 746
  • 1
  • 10
  • 26

3 Answers3

4

We could use Series.between and Series.shift

s = df['B'].between(182, 186, inclusive="neither")
df.loc[s | s.shift(2), 'B']

Output

0    184
2      6
3    183
5      7
Name: B, dtype: int64
ansev
  • 30,322
  • 5
  • 17
  • 31
1

The problem is that marker = df['B'].shift(-2).values[0] is always just taking the top value in the shifted column, not the value in relation to the iteration.

If you would like to keep your looping methodology you can zip the values and iterate them at the same time

for val,marker in zip(df['B'], df['B'].shift(-2)):
    if 182 < int(val) < 186:
        print(val)
        print(marker)

184
6.0
183
7.0
G. Anderson
  • 5,815
  • 2
  • 14
  • 21
  • I think loops are slow, we should try always avoid it – ansev Feb 27 '22 at 21:53
  • @ansev in general I absolutely agree. In fact, I upvoted your answer. I only included this as an alternative that might be more understandable given the code provided. I'd also be curious about the timings of the different answers, given that it might fall under the loops-aren't-bad exception in [Are for loops in pandas always bad? When should I care?](https://stackoverflow.com/questions/54028199/are-for-loops-in-pandas-really-bad-when-should-i-care) – G. Anderson Feb 28 '22 at 15:07
1
vals = df.loc[df.B.isin(range(112,187)), 'B'].rename('val')
markers = df.loc[[i+2 for i in vals.index], 'B'].rename('marker')

out = pd.concat([vals.reset_index(drop=True), markers.reset_index(drop=True)], axis=1)

OUTPUT

   val  marker
0  184       6
1  183       7
John Giorgio
  • 634
  • 3
  • 10