I have actually two CSV files, df1 and df2.
When I use the command: df1=pd.read_csv("path",index_col="created_at",parse_dates=["created_at"])
I get:
index likes ... user_screen_name sentiment
created_at ...
2019-02-27 05:36:29 0 94574 ... realDonaldTrump positive
2019-02-27 05:31:21 1 61666 ... realDonaldTrump negative
2019-02-26 18:08:14 2 151844 ... realDonaldTrump positive
2019-02-26 04:50:37 3 184597 ... realDonaldTrump positive
2019-02-26 04:50:36 4 181641 ... realDonaldTrump negative
... ... ... ... ... ...
When I use the command:
df2=pd.read_csv("path",index_col="created_at",parse_dates=["created_at"])
I get:
Unnamed: 0 Close Open Volume Day
created_at
2019-03-01 00:47:00 0 2784.49 2784.49 NaN STABLE
2019-03-01 00:21:00 1 2784.49 2784.49 NaN STABLE
2019-03-01 00:20:00 2 2784.49 2784.49 NaN STABLE
2019-03-01 00:19:00 3 2784.49 2784.49 NaN STABLE
2019-03-01 00:18:00 4 2784.49 2784.49 NaN STABLE
2019-03-01 00:17:00 5 2784.49 2784.49 NaN STABLE
... ... ... ... ... ...
As you know, when you use the command:
df3=df1.join(df2)
You will join the two tables based on the index "created_at" with the exact date and time in the two tables.
But I would like to have the result, with a delay, for an example, of 2 min.
For example, instead of:
file df1 file df2
created_at created_at
2019-02-27 05:36:29 2019-02-27 05:36:29
I would like to have the two tables join like this:
file df1 file df2
created_at created_at
2019-02-27 05:36:29 2019-02-27 05:38:29
It is important for my data that the time df1 is before df2. I mean it is important that the event df1 is before df2.