4

Let's consider two pandas dataframes:

import numpy as np
import pandas as pd

df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 6, 7])

check_df = pd.DataFrame([3, 2, 5, 4, 3, 6, 4, 2, 1])

If want to do the following thing:

  1. If df[1] > check_df[1] or df[2] > check_df[1] or df[3] > check_df[1] then we assign to df 1, and 0 otherwise
  2. If df[2] > check_df[2] or df[3] > check_df[2] or df[4] > check_df[2] then we assign to df 1, and 0 otherwise
  3. We apply the same algorithm to end of DataFrame

My primitive code is the following:

df_copy = df.copy()
for i in range(len(df) - 3):
    moving_df = df.iloc[i:i+3]
    if (moving_df >check_df.iloc[i]).any()[0]:
        df_copy.iloc[i] = 1
    else:
        df_copy.iloc[i] = -1
df_copy


    0
0   -1
1   1
2   -1
3   1
4   1
5   -1
6   3
7   6
8   7

Could you please give me a advice, if there is any possibility to do this without loop?

Lucian
  • 351
  • 2
  • 10

2 Answers2

3

IIUC, this is easily done with a rolling.min:

df['out'] = np.where(df[0].rolling(N, min_periods=1).max().shift(1-N).gt(check_df[0]),
                     1, -1)

output:

   0  out
0  1   -1
1  2    1
2  3   -1
3  2    1
4  5    1
5  4   -1
6  3    1
7  6   -1
8  7   -1

to keep the last items as is:

m = df[0].rolling(N).max().shift(1-N)
df['out'] = np.where(m.gt(check_df[0]),
                     1, -1)
df['out'] = df['out'].mask(m.isna(), df[0])

output:

   0  out
0  1   -1
1  2    1
2  3   -1
3  2    1
4  5    1
5  4   -1
6  3    1
7  6    6
8  7    7
mozway
  • 194,879
  • 13
  • 39
  • 75
  • This doesn't match the original result. – Scott Hunter Jul 21 '22 at 12:12
  • @Scott I was updating that part – mozway Jul 21 '22 at 12:13
  • 1
    What is `N`? ;)) – Lucian Jul 21 '22 at 12:14
  • You're almost there... :) – Scott Hunter Jul 21 '22 at 12:14
  • @Lucian the number of rows to consider, the same as the 3 in your code ;) – mozway Jul 21 '22 at 12:14
  • 1
    @Scott if you're talking about the row 6 not sure if this is wanted in the original result (the comparison is possible), anyway, this looks like a implementation detail to me, the important logic is the firsts part, then OP can mask any value as desired – mozway Jul 21 '22 at 12:15
  • It was posted as what was wanted. And remind me not to hire you to do an actual thing, and not just something like it, *especially* when you don't explain how this can be adapted to such "implementation details". – Scott Hunter Jul 21 '22 at 12:17
  • then `m2= m.isna() ; df['out'] = df['out'].mask(m2|m2.shift(-1), df[0])` – mozway Jul 21 '22 at 12:19
  • Hey @mozway, I have one question to your code. I understand it, but when using `check_df[0]`, aren't you always comparing shifted frame with `check_df[0]`, whereas it first three numbers should be compared with `check_df[0]`, next three numbers with `check_df[1]`, next three with `check_df[2]` and so on? – Lucian Jul 21 '22 at 12:21
  • Not sure I understand the question, does it produce unwanted results? – mozway Jul 21 '22 at 14:29
0

Although @mozway has already provided a very smart solution, I would like to share my approach as well, which was inspired by this post.

You could create your own object that compares a series with a rolling series. The comparison could be performed by typical operators, i.e. >, < or ==. If at least one comparison holds, the object would return a pre-defined value (given in list returns_tf, where the first element would be returned if the comparison is true, and the second if it's false).

Possible Code:

import numpy as np
import pandas as pd

df = pd.DataFrame([1, 2, 3, 2, 5, 4, 3, 6, 7])
check_df = pd.DataFrame([3, 2, 5, 4, 3, 6, 4, 2, 1])

class RollingComparison:
    
    def __init__(self, comparing_series: pd.Series, rolling_series: pd.Series, window: int):
        self.comparing_series = comparing_series.values[:-1*window]
        self.rolling_series = rolling_series.values
        self.window = window
        
    def rolling_window_mask(self, option: str = "smaller"):
        shape = self.rolling_series.shape[:-1] + (self.rolling_series.shape[-1] - self.window + 1, self.window)
        strides = self.rolling_series.strides + (self.rolling_series.strides[-1],)
        rolling_window = np.lib.stride_tricks.as_strided(self.rolling_series, shape=shape, strides=strides)[:-1]
        rolling_window_mask = (
            self.comparing_series.reshape(-1, 1) < rolling_window if option=="smaller" else (
                self.comparing_series.reshape(-1, 1) > rolling_window if option=="greater" else self.comparing_series.reshape(-1, 1) == rolling_window
            )
        )
        return rolling_window_mask.any(axis=1)
    
    def assign(self, option: str = "rolling", returns_tf: list = [1, -1]):
        mask = self.rolling_window_mask(option)
        return np.concatenate((np.where(mask, returns_tf[0], returns_tf[1]), self.rolling_series[-1*self.window:]))

The assignments can be achieved as follows:

roller = RollingComparison(check_df[0], df[0], 3)
check_df["rolling_smaller_checking"] = roller.assign(option="smaller")
check_df["rolling_greater_checking"] = roller.assign(option="greater")
check_df["rolling_equals_checking"] = roller.assign(option="equal")

Output (the column rolling_smaller_checking equals your desired output):

    0   rolling_smaller_checking  rolling_greater_checking  rolling_equals_checking
0   3   -1                        1                         1
1   2    1                       -1                         1
2   5   -1                        1                         1
3   4    1                        1                         1
4   3    1                       -1                         1
5   6   -1                        1                         1
6   4    3                        3                         3
7   2    6                        6                         6
8   1    7                        7                         7
ko3
  • 1,757
  • 5
  • 13