0

I have two dataframes, both with timestamp indexes. They have similar columns (the second dataframe lacks two columns). The second dataframe is also regularly re-created every second with new data, from some API. How can I continuously update the first dataframe with the information from the second (or the API)?

First dataframe looks like this:

                           Open        High  ...          MA        EMA
2021-04-29 09:31:00  583.473999  583.473999  ...         NaN        NaN
2021-04-29 09:32:00  584.304932  585.394850  ...  584.349534 583.983949

Second one looks like this:

                        Open     High      Low    Close
2021-04-29 09:33:00  578.107  579.412  577.942  579.251

I've already tried join, append, concat, combine_first, and update, all wrapped in some asyncio loop, with no success on any of them. They either don't update the first dataframe at all, or it doesn't overwrite the same index.

Ryan
  • 58
  • 6

1 Answers1

0

You can pd.concat(df1, df2) the dataframes, then run drop_duplicates with the argument keep='last', as described in this answer.

Jake Steele
  • 103
  • 8
  • [drop_duplicates](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html#pandas.DataFrame.drop_duplicates) ignores indexes and only has an option to check for columns. Is there any way to do this without having to copy the index to a column? – Ryan Apr 30 '21 at 17:00
  • Try this: after concatenating, set ```df1 = df1[~df1.index.duplicated(keep='first')]```. – Jake Steele Apr 30 '21 at 20:08
  • That results in the dataframe keeping the first value of that index, which isn't ideal since the point of the concat/append was to overwrite with the most up-to-date data. I've, uhh, decided to just copy the index to a column and use your first solution because settling is the key to happiness. – Ryan May 02 '21 at 01:25
  • Haha, that's the spirit! I'm sorry it wasn't more plug-and-play. Have a nice day! – Jake Steele May 04 '21 at 16:55