Python Compare two Excel by id value and "Filed" output in other columns difference in new excel with respect to ID column

Question

I have two excel(A.xlsx,B.xlsx with same sheet name example "testdata". The data format looks like this.

A.xlsx(sheet2)

B.xlsx (sheet2)

Now i want to do this operation If Filed is "ADV" match the "id" of file A and B and then merge "sum" "Stat" "Matur" "Block" new cell. example 709606 a-->b Open-->open unass-->unass False-->False

Basically i want to compare the changes with respect to "Filed" field and match "id" of two fields then compare and put the changes of each other columns in new excel file.

tried few methods and it is not correct

import datacompy, pandas as pd

df1=pd.read_excel('a.xlsx','sheet1', na_values=['NA'])
df2=pd.read_excel('b.xlsx','sheet1', na_values=['NA'])
result = df2.loc[df2['Filed'].isin(df1['Filed'])]
#print(result) Filed must be Filed=ADV then do differrrence only for ADV.
result.to_excel('diff.xlsx',index=False)

second method :

compare = datacompy.Compare(
    df1[df1['Filed']=='ADV'],
    df2[df2['Filed']=='ADV'],
    join_columns='Sum',  #You can also specify a list of columns eg ['policyID','statecode']
    abs_tol=0, #Optional, defaults to 0
    rel_tol=0, #Optional, defaults to 0
    df1_name='Original', #Optional, defaults to 'df1'
    df2_name='New' #Optional, defaults to 'df2'
)
print(compare.report())

How this can be achieved.Any solution is much welcome.Both not giving any good output .I know i am not writing the logic correctly .Support is needed .Thank you.

share a sample(5,6 rows) of both dataframes, not pics. use this as a guide. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples — sammywemmy, Feb 21 '20 at 06:05

trigonom · Answer 1 · 2020-02-24T20:36:56.740

0

id = list(range(100))
filled_items = ["ADV", "KN", "BBL", "TOOL"]
sum_items = ["a", "b", "c", "d", "e", "f", "g"]
df = pd.DataFrame(columns=["id","filled", "sum"])
df1 = pd.DataFrame(columns=["id","filled", "sum"])
df["id"] = random.sample(id,100)
df1["id"] = random.sample(id,100)
df["sum"] = random.choices(sum_items,k =100)
df1["sum"] = random.choices(sum_items,k =100)
df["filled"] = random.choices(filled_items,k =100)
df1["filled"] = random.choices(filled_items,k =100)
#till here we created 2 sample dataframe

df4  = df1.merge(df.loc[df["filled"]=="ADV"], on = "id") #this is the line you need

df["both_block"] = df[Block_x].astype(str)+"-->"+df[Block_y].astype(str)

output is

to merge the sum_y and sum_x to single column

df["both_block"] = df[Block_x].astype(str)+"-->"+df[Block_y].astype(str)

edited Feb 24 '20 at 20:36

answered Feb 21 '20 at 07:54

trigonom

528
4
9

didn't understand what you wrote here, the merge will take both dataframe and create a new one where in every row the "id" and "filled" are equal between 2 dataframes – trigonom Feb 21 '20 at 12:44
In this df = df1.merge(df2, on=["Id", "Filled"]) i want to check the differrence and merge only if the Filed ='ADV' how to do this ? – Marx Babu Feb 24 '20 at 03:40

Python Compare two Excel by id value and "Filed" output in other columns difference in new excel with respect to ID column

1 Answers1