0

I am facing issues in comparing two Dataframes of different lengths. Below is the issue:

    df1 = 
emp_id emp_name counts
1      sam       0
2      joe       0
3      john      0
    
df2 =
emp_id emp_name counts
1      sam       0
2      joe       0
2      joe       1
3      john      0

My Expect output is:

Expected_output_df = 
df1               df2    
empId   emp_name  emp_id   emp_name
1       sam       1   sam
2       joe       2   joe
NaN     NaN       2   joe
3       john      3   john

whereas am getting output as below:

actual_output_df = 
df1               df2    
empId   emp_name  emp_id   emp_name
1       sam       1   sam
2       joe       2   joe
3       john      2   joe
NaN     NaN       3   john

Please Note that my expectation is not to merge the 2 dataframes into one but I would like to concat two dataframes side by side and highlight the differences in such a way that, if there is a duplicate row in one df, example df2, the respective row of df1 should show as NaN/blank/None any Null kind of values Below is what I have followed:

  1. I tried to use df.Merge() to get the mismatched rows first which resulted in df1 and df2.
  2. pd.concat() the left-only and right-only rows. but my final output is not as expected. The main issue I have is in pd.concat. am able to get the differences in step 1. But after I concat, I am not able to move the duplicate rows one row down.

Can anyone please help me on this? Thanks in advance

  • Do you try solution from dupe? If yes, there was some problem? Can you explain more why not workin glike need? – jezrael Nov 16 '21 at 08:51
  • Thanks for your response. I think you are misunderstanding the question. The dataframes df1 and df2 are results of merge but I am failing to concat the two dataframes in a way where if df2 has a duplicate row, when I concat it with df1, the respective df1 row should show as NaN(because it doesnt have dups). please take a look at expected output. – Bhavyashree717 Nov 16 '21 at 08:55
  • Do you try solution or not? – jezrael Nov 16 '21 at 08:55
  • I did try your solution. It is not working for concat. I have edited the question accordingly. Please check. – Bhavyashree717 Nov 16 '21 at 09:00
  • yop, not working like `concat`, because `merge`. Still question is - do you try it? Can you add ouput from solution from dupe to question and explain what is problem? – jezrael Nov 16 '21 at 09:01
  • Below is the result dataframe am getting. My requirement is not one result dataframe. My requirement is placing the two dataframes side by side and highlighting the differences. emp_id emp_name 0 1 sam 1 2 joe 2 3 john 3 2 joe – Bhavyashree717 Nov 16 '21 at 09:06
  • I am going to post it as a new question. Please do not mark it as duplicate and close it. because your solution is not helping me. Thanks. – Bhavyashree717 Nov 16 '21 at 09:31
  • @What is your code for solution in question? It is missing there – jezrael Nov 16 '21 at 09:32
  • It seems you use wrong code, maybe miss need match by all 3 columns. – jezrael Nov 16 '21 at 09:34
  • because `3 john 2 joe` matching is not possible if use correct code. – jezrael Nov 16 '21 at 09:34
  • I have reposted the question with code snippet. Please check this link and please do not mark it as duplicate. https://stackoverflow.com/questions/69987310/pandas-concat-dataframes-with-duplicates – Bhavyashree717 Nov 16 '21 at 10:29
  • dont worry, now more clear what need. – jezrael Nov 16 '21 at 10:30

0 Answers0