1

I have a DataFrame with relevant stock information that looks like this.

Screenshot of my dataframe

I need it so that if the 'close' from one row is different from the 'open' in the next row a new dataframe will be created storing the ones that fulfill this criteria. I would like that all of the values from the row to be saved in the new dataframe. To clarify, I would like the two rows where this happens to be stored in the new dataframe.

DataFrame as text as requested: 

    timestamp            open    high    low     close   volume
0   2020-01-01 00:00:00  129.16  130.98  128.68  130.24  4.714333e+04
1   2020-01-01 08:00:00  130.24  132.40  129.87  132.08  5.183323e+04
2   2020-01-01 16:00:00  132.08  133.05  129.74  130.77  4.579396e+04
3   2020-01-02 00:00:00  130.72  130.78  128.69  129.26  6.606601e+04
4   2020-01-02 08:00:00  129.23  130.28  128.90  129.59  4.849893e+04
5   2020-01-02 16:00:00  129.58  129.78  126.38  127.19  9.919212e+04
6   2020-01-03 00:00:00  127.19  130.15  125.88  128.86  1.276414e+05
  • 2
    Please provide a sample of the DataFrame as text. `print(df.to_string())` should suffice. You do not need to include every row, but please include every column. – Steele Farnsworth Aug 17 '21 at 01:56
  • 1
    Please include a _small_ subset of your data as a __copyable__ piece of code that can be used for testing as well as your expected output for the __provided__ data. See [MRE - Minimal, Reproducible, Example](https://stackoverflow.com/help/minimal-reproducible-example), and [How to make good reproducible pandas examples](https://stackoverflow.com/q/20109391/15497888) for more information. – Henry Ecker Aug 17 '21 at 01:59
  • I have provided the DataFrame as text but I'm not sure how useful this can be to you guys. I formated it how it is supposed to look. – Andres Green Aug 17 '21 at 02:06

1 Answers1

0

This can be accomplished using Series.shift

>>> df['close'] != df['open'].shift(-1)
0  2020-01-01    False
1  2020-01-01    False
2  2020-01-01     True
3  2020-01-02     True
4  2020-01-02     True
5  2020-01-02    False
6  2020-01-03     True

This compares the close value in one row to the open value of the next row ("shifted" one row ahead).

You can then select the rows for which the condition is True.

>>> df[df['close'] != df['open'].shift(-1)]
             timestamp    open    high     low   close     volume
2 2020-01-01  16:00:00  132.08  133.05  129.74  130.77   45793.96
3 2020-01-02  00:00:00  130.72  130.78  128.69  129.26   66066.01
4 2020-01-02  08:00:00  129.23  130.28  128.90  129.59   48498.93
6 2020-01-03  00:00:00  127.19  130.15  125.88  128.86  127641.40

This only returns the second of the two rows; to get the first, we can then shift back one, and unite the two conditions.

>>> row_condition = df['close'] != df['open'].shift(-1)
>>> row_before = row_condition.shift(1)
>>> df[row_condition | row_before]
             timestamp    open    high     low   close     volume
0 2020-01-01  00:00:00  129.16  130.98  128.68  130.24   47143.33
1 2020-01-01  08:00:00  130.24  132.40  129.87  132.08   51833.23
2 2020-01-01  16:00:00  132.08  133.05  129.74  130.77   45793.96
3 2020-01-02  00:00:00  130.72  130.78  128.69  129.26   66066.01
4 2020-01-02  08:00:00  129.23  130.28  128.90  129.59   48498.93
5 2020-01-02  16:00:00  129.58  129.78  126.38  127.19   99192.12
6 2020-01-03  00:00:00  127.19  130.15  125.88  128.86  127641.40

Providing a textual sample of the DataFrame is useful because this can be copied directly into a Python session; I would have had to manually type the content of your screenshot otherwise.

Steele Farnsworth
  • 863
  • 1
  • 6
  • 15