I have an initial DataFrame, lets call it df
that looks like:
|date |Colombia |Peru |Argentina |Chile |Brazil |
------------------------------------------------------------------
|2020-09-26 |3 |5 |1 |4 |2 |
|2020-09-27 |5 |5 |1 |2 |3 |
|2020-09-28 |3 |1 |2 |6 |5 |
|2020-09-29 |1 |6 |1 |2 |4 |
|2020-09-30 |1 |7 |1 |2 |0 |
{'Colombia': {'2020-09-26': 3, '2020-09-27': 5, '2020-09-28': 3, '2020-09-29': 1, '2020-09-30': 1},
'Peru': {'2020-09-26': 5.0, '2020-09-27': 5.0, '2020-09-28': 1.0, '2020-09-29': 6.0, '2020-09-30': 7.0},
'Argentina': {'2020-09-26': 1.0, '2020-09-27': 1.0, '2020-09-28': 2.0, '2020-09-29': 1.0, '2020-09-30': 1.0},
'Chile': {'2020-09-26': 4.0, '2020-09-27': 2.0, '2020-09-28': 6.0, '2020-09-29': 2.0, '2020-09-30': 2.0},
'Brazil': {'2020-09-26': 2.0, '2020-09-27': 3.0, '2020-09-28': 5.0, '2020-09-29': 4.0, '2020-09-30': 0.0}}
And I have to concatenate it with several DataFrames for each country, that may or may not have common with df
, let's call one in particular df2
, that looks like:
| |Colombia |
------------------------
|datetime | |
-------------------------
|2020-10-01 |4 |
|2020-10-02 |2 |
|2020-10-03 |3 |
|2020-10-04 |5 |
|2020-10-05 |1 |
|2020-10-06 |0 |
{'Colombia': {'2020-10-01': 4, '2020-10-02': 2, '2020-10-03': 3, '2020-10-04': 5, '2020-10-05': 1, '2020-10-06': 0}}
All the DataFrames are indexed by dates.
I need that, if it would exist in the DataFrames as df2
a date that is already in df
, put the value that is already in df
and in another case to concatenate the two DataFrame and put the dates that are in df2 with their respective value.
This is what I tried:
dates_ind = numpy.in1d(df2.index, df.index)
if numpy.any(dates_ind):
df.loc[df2.index[dates_ind]][df2.columns[0]] = df2.loc[dates_ind].values.flatten()
else:
df = pd.concat([df, df2.loc[~dates_ind]], axis=0).drop_duplicates(keep="first")
But the problem is that it puts NaN
on dates that are not in the initial DataFrame, df
, for example:
|date |Colombia |Peru |Argentina |Chile |Brazil |
------------------------------------------------------------------
|2020-09-26 |3 |5 |1 |4 |2 |
|2020-09-27 |5 |5 |1 |2 |3 |
|2020-09-28 |3 |1 |2 |6 |5 |
|2020-09-29 |1 |6 |1 |2 |4 |
|2020-09-30 |1 |7 |1 |2 |0 |
|2020-10-01 |NA |NA |NA |NA |NA |
|2020-10-02 |NA |NA |NA |NA |NA |
|2020-10-03 |NA |NA |NA |NA |NA |
|2020-10-04 |NA |NA |NA |NA |NA |
|2020-10-05 |NA |NA |NA |NA |NA |
|2020-10-06 |NA |NA |NA |NA |NA |
I also tried this line of code
df = pd.concat([df, df2.loc[~dates_ind]], axis=0).drop_duplicates(keep="first")
Remove the ~
, but what it did not put the dates that are in the df2
and not in df
:
|date |Colombia |Peru |Argentina |Chile |Brazil |
------------------------------------------------------------------
|2020-09-26 |3 |5 |1 |4 |2 |
|2020-09-27 |5 |5 |1 |2 |3 |
|2020-09-28 |3 |1 |2 |6 |5 |
|2020-09-29 |1 |6 |1 |2 |4 |
|2020-09-30 |1 |7 |1 |2 |0 |
And I don't know what to do anymore.... Can you guys help me?