0

I have two excel(A.xlsx,B.xlsx with same sheet name example "testdata". The data format looks like this.

A.xlsx(sheet2) enter image description here

B.xlsx (sheet2) enter image description here

Now i want to do this operation If Filed is "ADV" match the "id" of file A and B and then merge "sum" "Stat" "Matur" "Block" new cell. example 709606 a-->b Open-->open unass-->unass False-->False

Basically i want to compare the changes with respect to "Filed" field and match "id" of two fields then compare and put the changes of each other columns in new excel file.

tried few methods and it is not correct

import datacompy, pandas as pd

df1=pd.read_excel('a.xlsx','sheet1', na_values=['NA'])
df2=pd.read_excel('b.xlsx','sheet1', na_values=['NA'])
result = df2.loc[df2['Filed'].isin(df1['Filed'])]
#print(result) Filed must be Filed=ADV then do differrrence only for ADV.
result.to_excel('diff.xlsx',index=False)

second method :

compare = datacompy.Compare(
    df1[df1['Filed']=='ADV'],
    df2[df2['Filed']=='ADV'],
    join_columns='Sum',  #You can also specify a list of columns eg ['policyID','statecode']
    abs_tol=0, #Optional, defaults to 0
    rel_tol=0, #Optional, defaults to 0
    df1_name='Original', #Optional, defaults to 'df1'
    df2_name='New' #Optional, defaults to 'df2'
)
print(compare.report())

How this can be achieved.Any solution is much welcome.Both not giving any good output .I know i am not writing the logic correctly .Support is needed .Thank you.

Marx Babu
  • 750
  • 3
  • 11
  • 34
  • share a sample(5,6 rows) of both dataframes, not pics. use this as a guide. https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples – sammywemmy Feb 21 '20 at 06:05

1 Answers1

0
id = list(range(100))
filled_items = ["ADV", "KN", "BBL", "TOOL"]
sum_items = ["a", "b", "c", "d", "e", "f", "g"]
df = pd.DataFrame(columns=["id","filled", "sum"])
df1 = pd.DataFrame(columns=["id","filled", "sum"])
df["id"] = random.sample(id,100)
df1["id"] = random.sample(id,100)
df["sum"] = random.choices(sum_items,k =100)
df1["sum"] = random.choices(sum_items,k =100)
df["filled"] = random.choices(filled_items,k =100)
df1["filled"] = random.choices(filled_items,k =100)
#till here we created 2 sample dataframe

df4  = df1.merge(df.loc[df["filled"]=="ADV"], on = "id") #this is the line you need

df["both_block"] = df[Block_x].astype(str)+"-->"+df[Block_y].astype(str)

output is enter image description here

to merge the sum_y and sum_x to single column

df["both_block"] = df[Block_x].astype(str)+"-->"+df[Block_y].astype(str)
trigonom
  • 528
  • 4
  • 9
  • didn't understand what you wrote here, the merge will take both dataframe and create a new one where in every row the "id" and "filled" are equal between 2 dataframes – trigonom Feb 21 '20 at 12:44
  • In this df = df1.merge(df2, on=["Id", "Filled"]) i want to check the differrence and merge only if the Filed ='ADV' how to do this ? – Marx Babu Feb 24 '20 at 03:40