1

I have an initial DataFrame, lets call it df that looks like:

|date        |Colombia   |Peru   |Argentina   |Chile   |Brazil   |
------------------------------------------------------------------
|2020-09-26  |3          |5      |1           |4        |2       |
|2020-09-27  |5          |5      |1           |2        |3       |
|2020-09-28  |3          |1      |2           |6        |5       |
|2020-09-29  |1          |6      |1           |2        |4       |
|2020-09-30  |1          |7      |1           |2        |0       |
{'Colombia': {'2020-09-26': 3, '2020-09-27': 5, '2020-09-28': 3, '2020-09-29': 1, '2020-09-30': 1}, 
 'Peru': {'2020-09-26': 5.0, '2020-09-27': 5.0, '2020-09-28': 1.0, '2020-09-29': 6.0, '2020-09-30': 7.0}, 
 'Argentina': {'2020-09-26': 1.0, '2020-09-27': 1.0, '2020-09-28': 2.0, '2020-09-29': 1.0, '2020-09-30': 1.0}, 
 'Chile': {'2020-09-26': 4.0, '2020-09-27': 2.0, '2020-09-28': 6.0, '2020-09-29': 2.0, '2020-09-30': 2.0}, 
 'Brazil': {'2020-09-26': 2.0, '2020-09-27': 3.0, '2020-09-28': 5.0, '2020-09-29': 4.0, '2020-09-30': 0.0}}

And I have to concatenate it with several DataFrames for each country, that may or may not have common with df, let's call one in particular df2, that looks like:

|             |Colombia |
------------------------
|datetime     |         |
-------------------------
|2020-10-01   |4        |
|2020-10-02   |2        |
|2020-10-03   |3        |
|2020-10-04   |5        |
|2020-10-05   |1        |
|2020-10-06   |0        |
{'Colombia': {'2020-10-01': 4, '2020-10-02': 2, '2020-10-03': 3, '2020-10-04': 5, '2020-10-05': 1, '2020-10-06': 0}}

All the DataFrames are indexed by dates.

I need that, if it would exist in the DataFrames as df2 a date that is already in df, put the value that is already in df and in another case to concatenate the two DataFrame and put the dates that are in df2 with their respective value.

This is what I tried:

dates_ind = numpy.in1d(df2.index, df.index)
    if numpy.any(dates_ind):
        df.loc[df2.index[dates_ind]][df2.columns[0]] = df2.loc[dates_ind].values.flatten()

    else:
        df = pd.concat([df, df2.loc[~dates_ind]], axis=0).drop_duplicates(keep="first")

But the problem is that it puts NaN on dates that are not in the initial DataFrame, df, for example:

|date        |Colombia   |Peru   |Argentina   |Chile   |Brazil   |
------------------------------------------------------------------
|2020-09-26  |3          |5      |1           |4        |2       |
|2020-09-27  |5          |5      |1           |2        |3       |
|2020-09-28  |3          |1      |2           |6        |5       |
|2020-09-29  |1          |6      |1           |2        |4       |
|2020-09-30  |1          |7      |1           |2        |0       |
|2020-10-01  |NA         |NA     |NA          |NA       |NA      |
|2020-10-02  |NA         |NA     |NA          |NA       |NA      |
|2020-10-03  |NA         |NA     |NA          |NA       |NA      |
|2020-10-04  |NA         |NA     |NA          |NA       |NA      |
|2020-10-05  |NA         |NA     |NA          |NA       |NA      |
|2020-10-06  |NA         |NA     |NA          |NA       |NA      |

I also tried this line of code

df = pd.concat([df, df2.loc[~dates_ind]], axis=0).drop_duplicates(keep="first")

Remove the ~, but what it did not put the dates that are in the df2 and not in df:

|date        |Colombia   |Peru   |Argentina   |Chile   |Brazil   |
------------------------------------------------------------------
|2020-09-26  |3          |5      |1           |4        |2       |
|2020-09-27  |5          |5      |1           |2        |3       |
|2020-09-28  |3          |1      |2           |6        |5       |
|2020-09-29  |1          |6      |1           |2        |4       |
|2020-09-30  |1          |7      |1           |2        |0       |

And I don't know what to do anymore.... Can you guys help me?

Valeria Arango
  • 287
  • 1
  • 2
  • 8
  • Yes, date is the index of both dataframes – Valeria Arango Feb 15 '22 at 20:24
  • 1
    You may find helpful answers here: https://stackoverflow.com/questions/9787853/join-or-merge-with-overwrite-in-pandas – poundifdef Feb 15 '22 at 20:37
  • You want to `pd.merge()`, not `pd.concat` (concat just puts the dataframes next to each other, while merging operates depending on whether specific columns or index are equal) – KingOtto Feb 15 '22 at 20:59

0 Answers0