0

I have two dataframe:

df1:
              A       B        
    0        12  some text    
    1        28  some text_1    
    2        28  some text_2   
    3        40  some text   
    4        50  some text  

df2:
              C       D        
    0        11  some text_1    
    1        11  some text_2
    2        12  some text
    3        28  some text   
    4        40  some text   
    5        60  some text 

This two dataframes are ordered by column A and C (are float) respectively. And I want:

 df3
                   A       B        C         D      
        0        NaN               11   some text_1
        1        NaN               11   some text_1
        2        12   some text    12   some text
        3        28   some text_1  28   some text
        4        28   some text_2  Nan
        5        40   some text    40   some text
        6        50   some text    Nan
        7        NaN               60   some text

So I want to concatenate dataframes by row with the same column value.

EDIT 1 By using df3 = df1.merge(df2, left_on=['A'], right_on=['C'], how='outer') the result is the cartesian product of all the same rows with A and C values. This create a bigger dataframe in case, for example if I have in df1 n times value x and in df2 m time the same x, the result would be m*n (m > n) rows with x value.

But I want only to concat the first n row of m from df1 with the n rows of df2 and leave the remaining m-n rows of df1 with a NaN value

linofex
  • 340
  • 1
  • 2
  • 17
  • Why is 'some_text_2' in df1.A:28 not tied to 'some_text' in df2.C:28? – r-beginners May 23 '20 at 07:30
  • @r-beginners in `df2` there is only one row with `28`, so only the first row with `28` in ` df1` is contatenated – linofex May 23 '20 at 07:33
  • It has already been determined that it has been posted, but if you have a column that contains the number of times the value of column A and C appears in each column, you can use concatenate to deal with it. `df.A:[1,1,2,1,1,1], df,C:[1,2,1,1,1,1]` – r-beginners May 23 '20 at 08:31

1 Answers1

1

Try something like this if you want to make sure A, B values in df1 are the same as C, D in df2 (or modify lists to include only one column):

df3 = df1.merge(df2, left_on=['A', 'B'], right_on=['C', 'D'], how='outer')
ilyankou
  • 1,309
  • 8
  • 13
  • this inserts the row with the same column (`A, C`) value multiple times, and not just the first time – linofex May 23 '20 at 09:07