Pandas dataframe can be accessed beyond its length

Question

I have two time-series datasets, stock1_data and stock2_data. Stock 1's data starts earlier in time than stock 2's, but stock 2's ends later in time. I would like to shorten both datasets to their time intersection. Both datasets are ordered in increasing time.

I have tried doing this: stock1_data = stock1_data.loc[(stock1_data['date'] >= stock2_data['date'][0])]

and this: stock2_data = stock2_data.loc[(stock2_data['date'] <= stock1_data['date'][len(stock1_data.date) - 1])]

The first line above works as intended, however the second line cuts off stock 2's data way too early. I looked at the length of stock 1, and it appears as though it can be accessed past its last index, given by len(stock1_data). It turns out it can be accessed by its original, unshortened length. Why is that?

Are the DataFrames ordered? Maybe you want `min(stock2_data['date'])` instead of `stock2_data['date'][0]` and `max(stock1_data['date'])` instead of `stock1_data['date'][len(stock1_data.date) - 1])]`? — Fernando Irarrázaval G, Jul 20 '19 at 22:05
The DataFrames are ordered. And yes your suggestion works! Any clue as to why my initial solution didn't work though? — user6745003, Jul 20 '19 at 22:12
You might find some context here: https://stackoverflow.com/questions/54613753/why-does-python-allow-out-of-range-slice-indexes-for-sequences — Alex Fish, Jul 21 '19 at 00:57

Pandas dataframe can be accessed beyond its length

0 Answers0