Pandas Check last N Rows for values, new column based on results

Question

I have a DataFrame, Df2. I'm trying to check each of the last 10 rows for the column Lead_Lag below - if there's any value besides null in any of those rows, then I want a new column Position to equal 'Y':

def run_HG_AUDUSD_15M_Aggregate():
    Df1 = pd.read_csv(max(glob.iglob(r"C:\Users\cost9\OneDrive\Documents\PYTHON\Daily Tasks\Pairs Trading\HG_AUDUSD\CSV\15M\Lead_Lag\*.csv"), key=os.path.getctime))    
    Df2 = Df1[['Date', 'Close_HG', 'Close_AUDUSD', 'Lead_Lag']]

    Df2['Position'] = ''

    for index,row in Df2.iterrows():
        if Df2.loc[Df2.index.shift(-10):index,"Lead_Lag"].isnull(): 
            continue
        else:
            Df2.loc[index, 'Position'] = "Y"

A sample of the data is as follows:

Date Close_HG Close_AUDUSD Lead_Lag
7/19/2017 12:59 2.7 0.7956 
7/19/2017 13:59 2.7 0.7955 
7/19/2017 14:14 2.7 0.7954 
7/20/2017 3:14 2.7 0.791 
7/20/2017 5:44 2.7 0.791 
7/20/2017 7:44 2.71 0.7925 
7/20/2017 7:59 2.7 0.7924 
7/20/2017 8:44 2.7 0.7953 Short_Both
7/20/2017 10:44 2.71 0.7964 Short_Both
7/20/2017 11:14 2.71 0.7963 Short_Both
7/20/2017 11:29 2.71 0.7967 Short_Both
7/20/2017 13:14 2.71 0.796 Short_Both
7/20/2017 13:29 2.71 0.7956 Short_Both
7/20/2017 14:29 2.71 0.7957 Short_Both

So in this case I would want the last two values for the new column Position to be 'Y' as there are values in the Lead_Lag column in at least one of the last 10 rows. I want to apply this on a rolling basis - for instance row 13 'Position' value would look at rows 12-3, row 12 'Position' value would look at rows 11-2, etc.

Instead I get the error:

NotImplementedError: Not supported for type RangeIndex

I've tried several variations of the shift method (defining before the loop etc.) and can't get it to work.

edit: Here's the solution:

N = 10
Df2['Position'] = ''
for index,row in Df2.iterrows():
    if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
        Df2.loc[index, 'Position'] = "Y"
    else:
        Df2.loc[index, 'Position'] = "N"

Please add your solution as answer instead of this edit to the question, to learn more see [tour] — Petter Friberg, Nov 04 '17 at 20:00

jezrael · Answer 1 · 2019-09-02T07:47:01.657

4

EDIT:

After post solution in question I found OP need something else - testing window N, so added another answer.

Old solution:

Use numpy.where with boolean mask by chaining:

m = df["Lead_Lag"].notnull() & df.index.isin(df.index[-10:])

Or by select column by position with iloc and add Falses by reindex:

m = df["Lead_Lag"].iloc[-10:].notnull().reindex(df.index, fill_value=False)

df['new'] = np.where(m, 'Y', '')

print (df)
               Date  Close_HG  Close_AUDUSD    Lead_Lag new
0   7/19/2017 12:59      2.70        0.7956         NaN    
1   7/19/2017 13:59      2.70        0.7955         NaN    
2   7/19/2017 14:14      2.70        0.7954         NaN    
3    7/20/2017 3:14      2.70        0.7910         NaN    
4    7/20/2017 5:44      2.70        0.7910         NaN    
5    7/20/2017 7:44      2.71        0.7925         NaN    
6    7/20/2017 7:59      2.70        0.7924         NaN    
7    7/20/2017 8:44      2.70        0.7953  Short_Both   Y
8   7/20/2017 10:44      2.71        0.7964  Short_Both   Y
9   7/20/2017 11:14      2.71        0.7963  Short_Both   Y
10  7/20/2017 11:29      2.71        0.7967  Short_Both   Y
11  7/20/2017 13:14      2.71        0.7960  Short_Both   Y
12  7/20/2017 13:29      2.71        0.7956  Short_Both   Y
13  7/20/2017 14:29      2.71        0.7957  Short_Both   Y

edited Sep 02 '19 at 07:47

answered Nov 04 '17 at 17:22

jezrael

822,522
95
1,334
1,252

1

"If there is value besides null, place `Y`", so you have to either change. `isnull()` to `notnull()` or swap the parameters in the `np.where` – foxyblue Nov 04 '17 at 17:25
2

nit: Lastly add space after `,` to conform to pep8. – foxyblue Nov 04 '17 at 17:28
@jezrael - so row 13 above would look back to row 3 to see if there's a non-null value in any of rows 12-3. Row 12 would look back to row 2, etc. so the 'new' column you have would be rolling. – Cole Starbuck Nov 04 '17 at 18:22
Thank you so much, Jezrael. I slightly changed your code to get it to work - it's now in the original post. – Cole Starbuck Nov 04 '17 at 19:52
@jezrael I have a similar problem except for this: "df.index.isin(df.index[-10:]" I need something that checks all rows prior to the current row instead of the prior 10. If it is simple, do you know the modification for that case? – Pylander Jun 28 '18 at 18:03
@jezrael No worries at all! Here are my two open questions: https://stackoverflow.com/questions/51052164/boolean-index-slicing-with-one-criterion-for-any-previous-non-na-value https://stackoverflow.com/questions/51072931/conditional-index-slice-all-prior-rows – Pylander Jun 28 '18 at 18:50

score 0 · Answer 2 · edited Sep 01 '19 at 20:40

0

This is what I ended up doing:

def run_HG_AUDUSD_15M_Aggregate():


N = 10
Df2['Position'] = ''

for index,row in Df2.iterrows():
    if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
        Df2.loc[index, 'Position'] = "Y"
    else:
        Df2.loc[index, 'Position'] = "N"

edited Sep 01 '19 at 20:40

jezrael

822,522
95
1,334
1,252

answered Nov 04 '17 at 21:44

Cole Starbuck

603
3
11
21

jezrael · Answer 3 · 2019-09-02T06:06:17.173

Sample:

np.random.seed(123)
M = 20
Df2 = pd.DataFrame({'Lead_Lag':np.random.choice([np.nan, 'N'], p=[.3,.7], size=M)})

Solution1 - pandas:

Explanation: First compare column for not equal with Series.ne for boolean Series and then use Series.rolling with Series.any for test values in window - last set N and Y by numpy.where:

N = 3

a = (Df2['Lead_Lag'].ne('N')
                    .rolling(N, min_periods=1)
                    .apply(lambda x: x.any(), raw=False))      
Df2['Pos1'] = np.where(a, 'Y','N')

Another numpy solution with strides and correct first N values to set to Falses:

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

x = np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N), axis=1)

Df2['Pos2'] = np.where(arr, 'Y','N')

Compare output:

print (Df2)
   Lead_Lag Pos1 Pos2
0         N    N    N
1       nan    Y    Y
2       nan    Y    Y
3         N    Y    Y
4         N    Y    Y
5         N    N    N
6         N    N    N
7         N    N    N
8         N    N    N
9         N    N    N
10        N    N    N
11        N    N    N
12        N    N    N
13      nan    Y    Y
14        N    Y    Y
15        N    Y    Y
16      nan    Y    Y
17      nan    Y    Y
18        N    Y    Y
19        N    Y    Y

Details of numpy solution:

Prepend False values for test first N -1 values:

print (np.concatenate([[False] * (N - 1), Df2['Lead_Lag'].ne('N').values]))
[False False False  True  True False False False False False False False
 False False False  True False False  True  True False False]

Strides return 2d array of boolean:

print (rolling_window(x, N))
[[False False False]
 [False False  True]
 [False  True  True]
 [ True  True False]
 [ True False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False False]
 [False False  True]
 [False  True False]
 [ True False False]
 [False False  True]
 [False  True  True]
 [ True  True False]
 [ True False False]]

Tested at least one True per rows by numpy.any:

print (np.any(rolling_window(x, N), axis=1))
[False  True  True  True  True False False False False False False False
 False  True  True  True  True  True  True  True]

EDIT:

If test with iterrows solution, output is different. Reason is this solution test in N + 1 window, so for same output is necessary add 1 to N:

N = 3
Df2['Position'] = ''

for index,row in Df2.iterrows():
    #for check windows
    #print (Df2.loc[index-N:index,"Lead_Lag"])
    if (Df2.loc[index-N:index,"Lead_Lag"] != "N").any():
        Df2.loc[index, 'Position'] = "Y"
    else:
        Df2.loc[index, 'Position'] = "N"

a = (Df2['Lead_Lag'].ne('N')
                    .rolling(N + 1, min_periods=1)
                    .apply(lambda x: x.any(), raw=False)  )      
Df2['Pos1'] = np.where(a, 'Y','N')

def rolling_window(a, window):
    shape = a.shape[:-1] + (a.shape[-1] - window + 1, window)
    strides = a.strides + (a.strides[-1],)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

x = np.concatenate([[False] * (N), Df2['Lead_Lag'].ne('N').values])
arr = np.any(rolling_window(x, N + 1), axis=1)

Df2['Pos2'] = np.where(arr, 'Y','N')

print (Df2)

   Lead_Lag Position Pos1 Pos2
0         N        N    N    N
1       nan        Y    Y    Y
2       nan        Y    Y    Y
3         N        Y    Y    Y
4         N        Y    Y    Y
5         N        Y    Y    Y
6         N        N    N    N
7         N        N    N    N
8         N        N    N    N
9         N        N    N    N
10        N        N    N    N
11        N        N    N    N
12        N        N    N    N
13      nan        Y    Y    Y
14        N        Y    Y    Y
15        N        Y    Y    Y
16      nan        Y    Y    Y
17      nan        Y    Y    Y
18        N        Y    Y    Y
19        N        Y    Y    Y

Pandas Check last N Rows for values, new column based on results

3 Answers3

Linked