0

I have two dataframes ,

A= 
ID compponent weight
12  Cap        0.4
12  Pump       183
12  label      0.05
14  cap        0.6 

B=
ID compponent_B weight_B
12  Cap_B       0.7
12  Pump_B       189
12  label      0.05


when i do merge of this two dataframes based on the ID as a key ,I get


ID component weight component_B  weight_B
12  Cap        0.4   Cap_B        0,7
12  Cap        0.4   Pump_B       189
12  Cap        0.4   label        0.05
12  Pump       183   Cap_B        0,7
12  Pump       183   Pump_B       189
14  Pump       183   label        0.05
...

I understand that the fact that i have One ID pour the 3 lines genrates 9 lines if i do the merge , but how can i only keep three lines without losing information ,if i do a drop_duplicates on Component i'll lose informations about component_B. i want something like :


ID component weight  component_B  weight_B
12  Cap        0.4    Cap_B        0,7
12  Pump       183    Pump_B       189
12  labeL      0,05   label        0.05


anyone know how to do it ?

chero
  • 61
  • 6
  • You should try doing merge based on ID and Component as it looks you really to do that not the merge based on ID only. – lostin Apr 16 '20 at 12:00
  • but the components aren't exactly the same – chero Apr 16 '20 at 12:09
  • Are the format of component same i..e. Cap_B, Pump_B and label_B or Cap_B, Pump_B and label ? – lostin Apr 16 '20 at 12:12
  • It's different , i was thinking more of a solution using drop after the merge cauz i have no other way to do it – chero Apr 16 '20 at 12:14
  • https://stackoverflow.com/questions/58025517/how-to-link-two-dataframes-based-on-the-string-similarity-of-one-column .. You can try this solution. – lostin Apr 16 '20 at 12:51

1 Answers1

1

you can create a column with a cumcount per ID to be able to merge on ID and this new column so like:

dfm = dfA.assign(cc=dfA.groupby('ID').cumcount())\
         .merge(dfB.assign(cc=dfB.groupby('ID').cumcount()), 
                on=['ID', 'cc'], how='outer')
print (dfm)
   ID compponent  weight  cc compponent_B  weight_B
0  12        Cap    0.40   0        Cap_B      0.70
1  12       Pump  183.00   1       Pump_B    189.00
2  12      label    0.05   2        label      0.05
3  14        cap    0.60   0          NaN       NaN
Ben.T
  • 29,160
  • 6
  • 32
  • 54
  • @chero maybe change the parameter how to 'inner' or 'left', depending on what you want to keep? – Ben.T Apr 16 '20 at 15:16
  • ur solution gaves me the 3 lines desired but i losted informations from the first dataframe , i only had "label" combined with the three component_B (cap_B and Pump_B and label) – chero Apr 16 '20 at 15:26
  • @chero hmm as you can see with the data you provided in your question it does not have the behavior you just say. in which condition you get label with the 3 components? – Ben.T Apr 16 '20 at 15:35