1

I have two dataframes A and B that contain different sets of patient data, and need to append certain columns from B to A - however only for those rows that contain information from the same patient and visit, i.e. where A and B have a matching value in two particular columns. B is longer than A, not all rows in A are contained in B. I don't know how this would be possible without looping, but many people discourage from looping over pandas dataframes (apart from the fact that my loop solution does not work because "Can only compare identically-labeled Series objects"). I read the options here How to iterate over rows in a DataFrame in Pandas but don't see which one I could apply here and would appreciate any tips!

Toy example (the actual dataframe has about 300 rows):


dict_A = {
    'ID': ['A_190792','X_210392','B_050490','F_311291','K_010989'],
    'Visit_Date': ['2010-10-31','2011-09-24','2010-30-01','2012-01-01','2013-08-13'],
    'Score': [30, 23, 42, 23, 31],
}
A = pd.DataFrame(dict_A) 


dict_B = {
    'ID': ['G_090891','A_190792','Z_060791','X_210392','B_050490','F_311291','K_010989','F_230989'],
    'Visit_Date': ['2013-03-01','2010-10-31','2013-04-03','2011-09-24','2010-30-01','2012-01-01','2013-08-13','2014-09-09'],
    'Diagnosis': ['F12', 'G42', 'F34', 'F90', 'G98','G87','F23','G42'],
}
B = pd.DataFrame(dict_B) 


for idx, row in A.iterrows():
    A.loc[row,'Diagnosis'] = B['Diagnosis'][(B['Visit_Date']==A['Visit_Date']) & (B['ID']==A['ID'])]
# Appends Diagnosis column from B to A for rows where ID and date match

I have seen this question Append Columns to Dataframe 1 Based on Matching Column Values in Dataframe 2 but the only answer is quite specific to it and also does not address the question whether a loop can/should be used or not

1 Answers1

2

i think you can use merge:

A['Visit_Date']=pd.to_datetime(A['Visit_Date'])
B['Visit_Date']=pd.to_datetime(B['Visit_Date'])
final=A.merge(B,on=['Visit_Date','ID'],how='outer')
print(final)
'''

         ID  Visit_Date  Score Diagnosis
0  A_190792  2010-10-31   30.0       G42
1  X_210392  2011-09-24   23.0       F90
2  B_050490  2010-30-01   42.0       G98
3  F_311291  2012-01-01   23.0       G87
4  K_010989  2013-08-13   31.0       F23
5  G_090891  2013-03-01    NaN       F12
6  Z_060791  2013-04-03    NaN       F34
7  F_230989  2014-09-09    NaN       G42
'''

if you want to only A:

A['Visit_Date']=pd.to_datetime(A['Visit_Date'])
B['Visit_Date']=pd.to_datetime(B['Visit_Date'])
final=A.merge(B,on=['Visit_Date','ID'],how='left')
print(final)
'''
         ID  Visit_Date  Score Diagnosis
0  A_190792  2010-10-31     30       G42
1  X_210392  2011-09-24     23       F90
2  B_050490  2010-30-01     42       G98
3  F_311291  2012-01-01     23       G87
4  K_010989  2013-08-13     31       F23
'''
Bushmaster
  • 4,196
  • 3
  • 8
  • 28
  • Thank you! However this does not work - "You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat", and concat does not have the option to pick specific rows... – user20501139 Nov 14 '22 at 21:01
  • Edited. Can you check it ? – Bushmaster Nov 14 '22 at 21:05
  • For some reason now it says "'DataFrame' object has no attribute 'concat'" - but it does though? Apart from that, maybe there is a more flexible solution where I wouldn't have to convert single columns - there are quite a few columns and the code needs to be adaptable to other studies where the tables might look differently – user20501139 Nov 14 '22 at 21:09
  • Are you sure you're using merge instead of concat? – Bushmaster Nov 14 '22 at 21:11
  • Sorry, you're right, I copied it wrong! Thanks for the answer, this is very helpful for now (but if there is any other more flexible way I'd love to hear it, especially because this solution appends the entire dataframe B, but I only need 1-2 columns from it and want to avoid creating some intermediate dataframes for that) – user20501139 Nov 14 '22 at 21:15
  • You can use `A = A.merge(B,on=['Visit_Date','ID'],how='left')` instead of `final`. This way there won't be a dataframe named final. Also in your case, I think one of the best solutions is the merge function. – Bushmaster Nov 14 '22 at 21:45