2

I have a dataframe from which I want to select data between a range, only the first occurrence of this range.

The dataframe:

data = {'x':[1,2,3,4,5,6,7,6.5,5.5,4.5,3.5,2.5,1], 'y':[1,4,3,3,52,3,74,64,15,41,31,12,11]} 
df = pd.DataFrame(data) 

eg: select x from 2 to 6, first occurarence:

     x   y
0   1.0  1 #out of range
1   2.0  4 #out of range
2   3.0  3 #this first occurrence
3   4.0  3 #this first occurrence
4   5.0  52 #thisfirst occurrence
5   6.0  3  #out of range
6   7.0  74 #out of range
7   6.5  64 #out of range
8   5.5  15 #not this since repeating RANGE
9   4.5  41 #not this since repeating RANGE
10  3.5  31 #not this since repeating RANGE
11  2.5  12 #not this since repeating RANGE
12  1.0  11 #out of range

Output

     x   y
2   3.0  3 #this first occurrence
3   4.0  3 #this first occurrence
4   5.0  52 #thisfirst occurrence

I am trying to modify this example: Select DataFrame rows between two dates to select data between 2 values for their first occurrence:

xlim=[2,6]
mask = (df['x'] > xlim[0]) & (df['x'] <= xlim[1])
df=df.loc[mask] #need to make it the first occurrence here
Leo
  • 1,176
  • 1
  • 13
  • 33
  • You mean first group of consecutive values within the range? – yatu Sep 19 '19 at 09:30
  • How is 5.5 (row 8) not the first occurrence? – James Sep 19 '19 at 09:32
  • Yes, the data is a measurement that is repeated back and forth. and I just need the first forward iteration – Leo Sep 19 '19 at 09:33
  • The first forward occurrence of values between the range, since we already got to 6, 5.5 is going backwards so it is out. – Leo Sep 19 '19 at 09:34

1 Answers1

3

Here's one approach:

# mask with True whenever a value is within the range
m = df.x.between(2,6, inclusive=False)
# logical XOR with the next row and cumsum
# Keeping only 1s will result in the dataframe of interest
df.loc[(m ^ m.shift()).cumsum().eq(1)]

    x   y
2  3.0   3
3  4.0   3
4  5.0  52

Details -

df.assign(in_range=m, is_next_different=(m ^ m.shift()).cumsum())

     x    y   in_range  is_next_different
0   1.0   1     False                  0
1   2.0   4     False                  0
2   3.0   3      True                  1
3   4.0   3      True                  1
4   5.0  52      True                  1
5   6.0   3     False                  2
6   7.0  74     False                  2
7   6.5  64     False                  2
8   5.5  15      True                  3
9   4.5  41      True                  3
10  3.5  31      True                  3
11  2.5  12      True                  3
12  1.0  11     False                  4
yatu
  • 86,083
  • 12
  • 84
  • 139
  • thanks, could you elaborate a bit on what it is doing ? – Leo Sep 19 '19 at 09:36
  • 1
    Yes adding some explanations @David8 – yatu Sep 19 '19 at 09:36
  • So the "is_next_different" part assigns +1 every time the in_range changes true/false? – Leo Sep 19 '19 at 09:40
  • 1
    Yes so `(m ^ m.shift())` will result in a True every time there is a `0^1` or `1^0` (same as with `True` and `False`), hence whenever there's a change from 1 to 0 or viceversa. That will happen once in the first group of consecutive values (with the first value within the range). By taking the cumsum you set to 1 all values in this first group. Then just index on wherever the 1s are @David8 – yatu Sep 19 '19 at 09:45