2

I have 2 similar dataframes (df1 & df2), having the columns id,date,count.

I want to concatenate these 2 df's, in a way such that if the {id,date} overlaps, I want to give preference to the row from df2.

e.g.: df1 has the record: id1, 2018/02/03, 45

df2 has the record: id1, 2018/02/03, 65

When I merge/concat, I only need the record from df2 i.e. {id1, 2018/02/03, 65}.

Apart from this duplicate resolve logic, I need all the other rows from both the df's as is. How can I achieve this in python?

Sanchay
  • 1,053
  • 1
  • 16
  • 33

3 Answers3

2

You can use the optional arguments to the drop_duplicates functionality for this.

As a toy example, consider the frames:

df1 = pd.DataFrame({"id" : [1,2,3,4], "time" : [25, 35, 45, 15]})  
df2 = pd.DataFrame({"id" : [4,5,2,9], "time" : [19, 14, 11, 12]})  

Then, all you need to do is concatenate df1 and df2, and drop duplicates by id and keep the last to make sure the information is kept from df2.

df_concatenated = pd.concat([df1, df2]).drop_duplicates(subset="id", keep="last")   

df_concatenated  
>  id  time
    1    25
    3    45
    4    19
    5    14
    2    11
    9    12
Cihan
  • 2,267
  • 8
  • 19
0

Using the set_index method and then loop to remove all encounters of df2 items in df1.

 df1 = pd.DataFrame({'id' : [1,2,3], 'date' : ['d1', 'd2', 'd3'], 'count': [11, 22, 33]})
 df2 = pd.DataFrame({'id' : [1,2,5], 'date' : ['d1', 'd5', 'd6'], 'count': [44, 55, 66]})
 df1 = df1.set_index(['id', 'date'])
 df2 = df2.set_index(['id', 'date'])

 for index, row in df2.iterrows():
     if index in df1.index:
         df1 = df1.drop(index)
 print(df1.append(df2))

         count
id date
2  d2       22
3  d3       33
1  d1       44
2  d5       55
5  d6       66
DiCaprio
  • 823
  • 1
  • 7
  • 24
0

I believe what you're looking for is similar to this question

Per anky_91's answer, you should get the desired results with df2.combine_first(df1)

Check out the documentation for a detailed explanation on the behavior of this function when you combine 2 irregular dataframes.

kerwei
  • 1,822
  • 1
  • 13
  • 22