0

I have two time-series datasets, stock1_data and stock2_data. Stock 1's data starts earlier in time than stock 2's, but stock 2's ends later in time. I would like to shorten both datasets to their time intersection. Both datasets are ordered in increasing time.

I have tried doing this: stock1_data = stock1_data.loc[(stock1_data['date'] >= stock2_data['date'][0])]

and this: stock2_data = stock2_data.loc[(stock2_data['date'] <= stock1_data['date'][len(stock1_data.date) - 1])]

The first line above works as intended, however the second line cuts off stock 2's data way too early. I looked at the length of stock 1, and it appears as though it can be accessed past its last index, given by len(stock1_data). It turns out it can be accessed by its original, unshortened length. Why is that?

  • 1
    Are the DataFrames ordered? Maybe you want `min(stock2_data['date'])` instead of `stock2_data['date'][0]` and `max(stock1_data['date'])` instead of `stock1_data['date'][len(stock1_data.date) - 1])]`? – Fernando Irarrázaval G Jul 20 '19 at 22:05
  • The DataFrames are ordered. And yes your suggestion works! Any clue as to why my initial solution didn't work though? – user6745003 Jul 20 '19 at 22:12
  • You might find some context here: https://stackoverflow.com/questions/54613753/why-does-python-allow-out-of-range-slice-indexes-for-sequences – Alex Fish Jul 21 '19 at 00:57

0 Answers0