0

I am dealing with multiple DFs. Each DF has two variables. One is unique IDs, the other is the year. I would like to merge these DFs by Year and then see how many duplicates I return per year.

A data frame looks like this.

ID      Year
11111   2013
21314   2014
24141   2015

Except, each frame has a lot more IDs. And there are multiple frames.

Ex: I have 11111 in DF1 for 2013.
Ex: I have 11111 in DF2 for 2013.
Ex: I have 11111 in DF3 for 2013.

How would I combine this so I have all these organized by 2013, and can see all the duplicates in this Data Frame?

I would need to make multiple other data frames as well.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578

1 Answers1

0

Pandas/Python: How to concatenate two dataframes without duplicates?

Does the link above give you the answer?

The highlighted answer in that link seems to show that you can combine the dataframes and also drop the duplicates if needed.

dtc
  • 1,774
  • 2
  • 21
  • 44
  • How would I be able to calculate the duplicates per year though? I want to keep the duplicates if anything. So I would like to keep track of duplicates for each year. – Ishan Jain Oct 13 '20 at 18:51
  • what do you mean by calculate? you mean you want to count it? https://stackoverflow.com/questions/35584085/how-to-count-duplicate-rows-in-pandas-dataframe seems to be the answer but you'll have to take a look and play around with it. – dtc Oct 15 '20 at 00:04