0

I have these two datasets, each containing id:s and timestamps, and then some ohter columns, here is example of the pandas dataframes:

df1

id timestamp approved sum
1 10-1-2020 1 20
1 20-1-2020 2 24
2 1-1-2019 2 3
2 3-3-2020 3 5
2 4-3-2020 3 8
3 1-1-2021 1 1
3 1-2-2021 2 6
6 2-1-2023 1 20
7 1-1-2018 1 15

df2

id timestamp value1sum value2sum
1 11-1-2020 1 2
1 14-1-2020 3 4
1 15-1-2020 5 8
2 3-1-2019 1 1
2 4-1-2019 2 3
2 5-1-2019 3 3
3 1-2-2021 2 2
3 1-3-2021 2 4
3 2-3-2021 2 5
4 1-4-2022 5 5
5 1-1-2023 6 6

I want to connect the dataframes so the dataframe with both frames connected would look like this:

id timestamp approved sum value1sum value2sum
1 10-1-2020 1 20 0 0
1 20-1-2020 2 24 5 8
2 1-1-2019 2 3 0 0
2 3-3-2020 3 5 3 3
2 4-3-2020 3 8 3 3
3 1-1-2021 1 1 0 0
3 1-2-2021 2 6 2 2
3 1-3-2021 2 6 2 4
3 2-3-2021 2 6 2 5

So basically what would happen, is that column values between ids and timestamps would go one on one. So like we see with the id 1 in the connected dataframe, in the first row (taken from the df1) we see the same row as in df1 and then columns value1sum and value2sum added, but since there is not any records before that timestamp, the values are 0 in both columns. Then in the second row of id 1 we can see that value1sum and value2sum has values 5 and 8, since they are the last values before timestamp 20-1-2020.

Then with the id 3 in the connected dataframe we can see that the last two rows have same values in approved and sum columns as the earlier rows because there is not any records after the timestamp 1-2-2021 but there is records after that timestamp in df2 which are added to connected dataframe.

Is this kind of dataframe connecting even possible? I didn't find any good documents or tutorials of it with quick searching. Hope you got the idea what I was thinking.

Lurri
  • 45
  • 5
  • 1
    Convert your dates `to_datetime`, sort by date and use `merge_asof` (see duplicate) – mozway Feb 27 '23 at 07:35
  • I watched the double and didn't quite understand the solution. I want to merge the dataframes by id and also timestamp, and i didn't understand how to do them both with merge_asof. – Lurri Feb 27 '23 at 07:59
  • 1
    Use `by='id', on='timestamp'` – mozway Feb 27 '23 at 08:03
  • I converted dates `to_datetime` and sorted them, then ran code: `mergedtest = pd.merge_asof(left=df1, right=df2,by='id',on='timestamp', direction='nearest')` This worked like i thought it would, but i would also want the rows from df2 where in df2 the timestamps are bigger than in df1. So these rows would have same data than in df1's last timestamp columns and then the updated values in columns value1sum and value2sum from df2 – Lurri Feb 27 '23 at 09:00

0 Answers0