0

I have two data frames in Pandas with following structure:

  1. pr_df:
     id  prMerged  idRepository  avgTime
0     1      1           2         63.93
1     2      0           3         41.11
2     3      0           3         36.03
3     4      1           4         77.28
...
98   99      1           20        54.78   
99  100      0           20        42.12
  1. repo_df
    id  stars   forks
0    1   1245     45
1    2   3689     78
2    3    458     15
3    4    954     75
...
19  20   1947    102

I would like to combine pr_df with repo_df by comparing idRepository (from pr_df) and id (from repo_df) with each other and add two columns to pr_df: stars and forks. As a result, I would like to achieve:

pr_df:

     id  prMerged  idRepository  avgTime   stars    forks
0     1      1           2         63.93    3689     78
1     2      0           3         41.11     458     15
2     3      0           3         36.03     458     15
3     4      1           4         77.28     954     75
...
98   99      1           20        54.78    1947     102
99  100      0           20        42.12    1947     102

How can I do it using Pandas? How can I compare idRepository with id and add new columns to pr_df based on that?

  • 1
    I think you might be looking for what's explained in the _"Different names for key columns"_ section of this other great answer: https://stackoverflow.com/a/53645883/289011 – Savir Apr 08 '22 at 20:22

1 Answers1

2

You can use the merge function, and you have to supply the columns that you want to merge on.

pr_df.merge(repo_df, left_on='idRepository', right_on='id')
user7375116
  • 201
  • 1
  • 7