0

I have data in two excel files like below

Sample DS Created:

df1 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}
df1 = pd.DataFrame(df1, columns=df1.keys())

df2 =  {'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]}
df2 = pd.DataFrame(df2, columns=df2.keys())

please help me to get difference of both excels as below..

Transaction_name    Count_df1        Count_df2
SC-001_Homepage          1              2
SC-001_Homepage          1              1
SC-001_Homepage          2              2

First line of the output count is not matching. Will i be able to highlight in different color? Sample code is as below

#COmparing both excels
df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object)
df2 = pd.read_csv(r"results.csv", dtype=object)

print('\n', df1)
print('\n',df2)

df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str)
df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str)

print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])])

Thanks in advance

SG131712
  • 135
  • 3
  • 13
  • you have to show examples of the work you've done so far. – Shailyn Ortiz Feb 04 '19 at 05:32
  • Please find the work i had done so far to achieve the result #Formatting WLM data data = pd.read_excel(r"Script wise coordinates comparison_edited123.xlsx", sheet_name='WLM', dtype=object) data = pd.DataFrame(data, columns=data.keys()) df = pd.melt(data, id_vars=['Script_name'], value_name='Count') df['Transaction_Name'] = df['Script_name'] + '_' + df['variable'] Final_df = df[['Transaction_Name', 'Count']] Final_df.to_csv(r'WLMOUTPUT.csv', index=False) Code continues in the next comment to compare both CSV files – SG131712 Feb 04 '19 at 05:49
  • #COmparing both excels df1 = pd.read_csv(r"WLMOUTPUT.csv", dtype=object) df2 = pd.read_csv(r"results.csv", dtype=object) print('\n', df1) print('\n',df2) df1['Compare'] = df1['Transaction_Name'] + df1['Count'].astype(str) df2['Compare'] = df2['Transaction_Name'] + df2['Count'].astype(str) print('\n', df1.loc[~df1['Compare'].isin(df2['Compare'])]) – SG131712 Feb 04 '19 at 05:49
  • 1
    You have to provide it inside your question, you can edit it to adapt accordingly, also you need to format your text – Shailyn Ortiz Feb 04 '19 at 06:16
  • Its done please check once – SG131712 Feb 04 '19 at 06:34
  • 1
    @SwethaGorantla the reason no one has answered yet is there is too much info here. I would suggest you to just post 5-6 lines of the sample data which depicts what you are trying to acheive in a dataframe format so we can copy the data and replicate the issue along with 2 lines of explaination , your sample code (not the full, the useful bit only) and an expected output, just those. :) Check [this](https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples) – anky Feb 04 '19 at 07:31
  • 1
    Edited the question as explained, please check once. – SG131712 Feb 04 '19 at 08:59

1 Answers1

1

You can use the merge function.

import pandas as pd

df1 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [1, 1, 2]}) 
df2 = pd.DataFrame({'Transaction_Name':['SC-001_Homepage', 'SC-002_Homepage', 'SC-001_Signinlink'], 'Count': [2, 1, 2]})

merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'))

This will give you this DataFrame:

print(merged_df)

   Count_df1   Transaction_Name  Count_df2
0          1    SC-001_Homepage          2
1          1    SC-002_Homepage          1
2          2  SC-001_Signinlink          2

And then you can just use subsetting to see which rows have different counts:

diff = merged_df[merged_df['Count_df1'] != merged_df['Count_df2']]

And you will get this:

print(diff)

   Count_df1 Transaction_Name  Count_df2
0          1  SC-001_Homepage          2
Luc Blassel
  • 394
  • 4
  • 15
  • Thanks Luc Blassel, i could implement it and got required output. Can you also please help me if any of the transactions are missing how to capture it. For example in df1 there are 3 transactions and count of it, in df2 it has only 2 transactions and count of it. How to find that one transaction which is missing in df2? – SG131712 Feb 04 '19 at 10:04
  • You can specify the `outer` option when merging the 2 DataFrames: `merged_df = pd.merge(df1, df2, on = 'Transaction_Name', suffixes=('_df1', '_df2'), how='outer')` an then when one of the transactions is missing in one of the DFs it will show us as `NaN` in `merged_df`. If this answered your question, please consider accepting it as the answer. – Luc Blassel Feb 04 '19 at 10:11
  • As you said it gives NaN at the missing transaction also it gives complete output. Is it possible to get only the missing Transaction name in df2 as below... Transaction_Name Count_df1 Count_df2 0 SC-001_AppLaunch_Signed 0 NaN – SG131712 Feb 04 '19 at 10:58
  • From [this answer](https://stackoverflow.com/a/14247708/8650928) you can get rows with only `NaN` using `merged_df[merged_df.isnull().any(axis=1)]` – Luc Blassel Feb 04 '19 at 11:02