Requirement:
I have two csv files named S1 and S2. Both sheet have Contact_ID as common value. I want to fetch all the records which satisfy the given conditions.
Table/Sheet Structure:
S1[Contact_ID, T1, T2, T3] primary_key = Contact_ID and foreign_key = Contact_ID2
S2[Contact_ID2, L1, L2, L3]
Compare T1, T2, T3 to L1, L2, L3 respectively then return Contact_ID and print YES/NO (in new column) if they matched.
Output: S3(Contact_ID,Contact_ID2, T1, L1, T2, L2, T3, L3, Matched)
New sheet(S3) should have columns (Contact_ID,Contact_ID2, T1, L1, T2, L2, T3, L3, Matched) "Matched" columns will stored the YES/NO/Not Applicable value.
TABLE S1
Contact_ID2 L1 L2 L3
590474 Voice Can't Be Called Fast Busy
590000 Voice Circuit Circuit Down
590744 Voice Circuit Circuit Down
588642 Voice Can't Call Out Cannot Call Out
TABLE S2
Contact_ID T1 T2 T3
588439 Voice Circuit Circuit Down
588984 Voice Call Quality Static/Noise
590000 Voice No Dial Tone No Dial Tone
588563 Voice No Dial Tone No Dial Tone
588642 Voice Can't Call Out Cannot Call Out
Expected Output: TABLE S3
Contact_ID2 L1 L2 L3 Contact_ID T1 T2 T3 Matched
590474 Voice Can't Be Called Fast Busy NO
590000 Voice Circuit Circuit Down 590000 Voice No Dial Tone No Dial Tone NO
590744 Voice Circuit Circuit Down NO
588642 Voice Can't Call Out Cannot Call Out 588642 Voice Can't Call Out Cannot Call Out YES
My Approach:
- I have merged both the sheets using Left Join w.r.t Contact_ID and saved in sheet S3.[need all S1 data, hence left join.]
- In S3, Check if Contact_ID equals to Contact_ID2 ,if True, then
2.1. Check if (T1 == L1),( T2 == L2) & (T3 == L3) if True, Print YES in new column
2.2. Else if any (T1 != L1) OR (T2 != L2) OR (T3 != L3) then print NO
- In S3, if (Contact_ID != Contact_ID2) ,Print "Not Applicable".
My Code:
import pandas as pd
import numpy as np
pd.options.mode.chained_assignment = None # default='warn'
S1_data = {'Contact_ID' : [590474,590000, 590744, 588642],
'L1' :['Voice','Voice','Voice','Voice'],
'L2' :['Can\'t Be Called', 'Circuit','Circuit','Can\'t Call Out'],
'L3' :['Fast Busy','Circuit Down', 'Circuit Down','Cannot Call Out']
}
S2_data = {'Contact_ID2' : [588439,588984,590000, 588563, 588642],
'T1' :['Voice','Voice','Voice','Voice','Voice'],
'T2' :['Circuit', 'Call Quality','No Dial Tone','No Dial Tone','Can\'t Call Out'],
'T3' :['Circuit Down','Static/Noise', 'No Dial Tone','No Dial Tone','Cannot Call Out']}
S1 = pd.DataFrame(S1_data)
S2 = pd.DataFrame(S2_data)
S3 = pd.merge(S1, S2, how = 'left', left_on = ['Contact_ID'], right_on = ['Contact_ID2'])
if S3['Contact_ID'] == S3['Contact_ID2']:
if (S3['T1'] == S3['L1']) & (S3['T2'] == S3['L2']) & (S3['T3'] == S3['L3']):
S3.iloc[:,'Matched'] = 'YES'
else:
S3.loc[:,'Matched'] = 'NO'
else:
S3.iloc[:,'Matched'] = 'NA'