I'm struggling to figure out how to take the elements that are present in one dataframe and use them to fill in the missing values in another based on a column of their time.
I have one that has minute data, but has some gaps in it (spanning almost a day), and another that has hourly data with no gaps in it. I want to fill in the missing rows in the minute data with the hourly data without duplicating the hours that I do have in the minute data.
import pandas as pd
df1 = pd.DataFrame({'Unix Timestamp': [1444311660, 1444311720, 1444311780, 1444311840, 1444311900,
1444312140], 'price': [242.5, 242.5, 243.7, 290.0, 293.0, 287.0]})
df2 = pd.DataFrame({'Unix Timestamp': [1444311780, 1444311840, 1444311900, 1444311960, 1444312020],
'price': [243.7, 290.0, 293.0, 295.0, 294.0]})
print(df1.head())
print(df2.head())
df1
Unix Timestamp price
1444311660 242.5
1444311720 242.5
1444311780 243.7
1444311840 290
1444311900 293
1444312140 287
df2
Unix Timestamp price
1444311780 243.7
1444311840 290
1444311900 293
1444311960 295
1444312020 294
I've tried finding the rows in df2 where the Unix Timestamp isn't in the list of Unix Timestamps in df1, then adding them and resorting based on Unix Timestamp, but it gives me an empty dataframe
missing = df1.loc[~df1['Unix Timestamp'].isin(df2['Unix Timestamp'])]
df1 = pd.concat([df1, missing], ignore_index=True, sort=False)
df1 = df1.sort_values(by='Unix Timestamp')
df1 = df1.reset_index(drop=True)
print(df1.head(10))
Expected Output:
df1
Unix Timestamp price
1444311660 242.5
1444311720 242.5
1444311780 243.7
1444311840 290
1444311900 293
1444311960 295 ^
1444312020 294 ^
1444312140 287
Carets added to draw attention to which rows were added. I also need to use the entire row because there are more columns than price in the real one
Any help?