-1

Requirement:
I have two csv files named S1 and S2. Both sheet have Contact_ID as common value. I want to fetch all the records which satisfy the given conditions.

Table/Sheet Structure:
S1[Contact_ID, T1, T2, T3] primary_key = Contact_ID and foreign_key = Contact_ID2
S2[Contact_ID2, L1, L2, L3]

Compare T1, T2, T3 to L1, L2, L3 respectively then return Contact_ID and print YES/NO (in new column) if they matched.

Output: S3(Contact_ID,Contact_ID2, T1, L1, T2, L2, T3, L3, Matched)

New sheet(S3) should have columns (Contact_ID,Contact_ID2, T1, L1, T2, L2, T3, L3, Matched) "Matched" columns will stored the YES/NO/Not Applicable value.

TABLE S1                                
Contact_ID2 L1      L2              L3                  
590474  Voice   Can't Be Called Fast Busy                   
590000  Voice   Circuit         Circuit Down                    
590744  Voice   Circuit         Circuit Down                    
588642  Voice   Can't Call Out  Cannot Call Out                 
                                
TABLE S2                                
Contact_ID  T1      T2              T3                  
588439  Voice   Circuit         Circuit Down                    
588984  Voice   Call Quality    Static/Noise                    
590000  Voice   No Dial Tone    No Dial Tone                    
588563  Voice   No Dial Tone    No Dial Tone                    
588642  Voice   Can't Call Out  Cannot Call Out                 
                                
Expected Output: TABLE S3                               
Contact_ID2 L1      L2              L3              Contact_ID  T1      T2              T3              Matched
590474  Voice   Can't Be Called Fast Busy                                                           NO
590000  Voice   Circuit         Circuit Down    590000  Voice   No Dial Tone    No Dial Tone    NO
590744  Voice   Circuit         Circuit Down                                                        NO
588642  Voice   Can't Call Out  Cannot Call Out 588642  Voice   Can't Call Out  Cannot Call Out YES

My Approach:

  1. I have merged both the sheets using Left Join w.r.t Contact_ID and saved in sheet S3.[need all S1 data, hence left join.]
  2. In S3, Check if Contact_ID equals to Contact_ID2 ,if True, then
    2.1. Check if (T1 == L1),( T2 == L2) & (T3 == L3) if True, Print YES in new column
    2.2. Else if any (T1 != L1) OR (T2 != L2) OR (T3 != L3) then print NO
  1. In S3, if (Contact_ID != Contact_ID2) ,Print "Not Applicable".

My Code:

    import pandas as pd
    import numpy as np
    pd.options.mode.chained_assignment = None  # default='warn' 
    
    S1_data = {'Contact_ID' : [590474,590000, 590744, 588642],
           'L1' :['Voice','Voice','Voice','Voice'],
           'L2' :['Can\'t Be Called', 'Circuit','Circuit','Can\'t Call Out'],
           'L3' :['Fast Busy','Circuit Down', 'Circuit Down','Cannot Call Out']
          }

    S2_data = {'Contact_ID2' : [588439,588984,590000, 588563, 588642],
           'T1' :['Voice','Voice','Voice','Voice','Voice'],
           'T2' :['Circuit', 'Call Quality','No Dial Tone','No Dial Tone','Can\'t Call Out'],
           'T3' :['Circuit Down','Static/Noise', 'No Dial Tone','No Dial Tone','Cannot Call Out']}

    S1 = pd.DataFrame(S1_data)
    S2 = pd.DataFrame(S2_data)

    
    S3 = pd.merge(S1, S2, how = 'left', left_on = ['Contact_ID'], right_on = ['Contact_ID2'])
    
    if S3['Contact_ID'] == S3['Contact_ID2']:
        if (S3['T1'] == S3['L1']) & (S3['T2'] == S3['L2']) & (S3['T3'] == S3['L3']):
            S3.iloc[:,'Matched'] = 'YES'
        else:
            S3.loc[:,'Matched'] = 'NO'
                        
    else:
        S3.iloc[:,'Matched'] = 'NA' 

2 Answers2

1

Merge using outer to get union of S1 and S2, evaluate your conditions (T1 == L1, T2 == L2, T3 == L3) and fix inconsistent Contact_ID != Contact_ID2:

S3 = pd.merge(S1, S2, how='outer', left_on=['Contact_ID'], right_on = ['Contact_ID2'])

S3['Matched'] = (S3['T1'] == S3['L1']) & (S3['T2'] == S3['L2']) & (S3['T3'] == S3['L3'])
S3.loc[S3['Contact_ID'] != S3['Contact_ID2'], 'Matched'] = pd.NA

S3['Matched'] = S3['Matched'].replace({True: 'YES', False: 'NO'})
>>> S3[['Contact_ID', 'Contact_ID2', 'Matched']]
   Contact_ID  Contact_ID2 Matched
0  59017474.0          NaN    <NA>
1  59011000.0   59011000.0      NO
2  59031744.0          NaN    <NA>
3  58838642.0   58838642.0     YES
4         NaN   58837439.0    <NA>
5         NaN   58837984.0    <NA>
6         NaN   58838563.0    <NA>
Corralien
  • 109,409
  • 8
  • 28
  • 52
0

Posting my final accepted answer. It may be helpful to others.

Merge both table S1 and S2 and then evaluate the condition and removed the S3['Contact_ID'] != S3['Contact_ID2'] with the Not Applicable

Lastly, update the the True and False with YES & NO.

# Join the crm_data and nlp_data
S3 = pd.merge(S1, S2, how='left', left_on=['Contact_ID2'], right_on = ['Contact_ID'], indicator = True)

# Check if ContactID, Tier1, Tier2, Tier3 are matching or not in S2 and S1
S3['Matched'] = (S3['L1'] == S3['T1']) & (S3['L2'] == S3['T2']) & (S3['L3'] == S3['T3']) & (S3['Contact_ID'] == S3['Contact_ID2'])
S3.loc[S3['Contact_ID'] != S3['Contact_ID2'], 'Matched'] = pd.NA
S3['Matched'] = S3['Matched'].replace({True: 'YES', False: 'NO'}) # Replace True -> YES and False -> NO

S3.drop(['_merge'], axis = 1, inplace = True)  #Drop the _merge column
>>> S3
Contact_ID2 L1      L2              L3              Contact_ID  T1      T2              T3              Matched
590474  Voice   Can't Be Called Fast Busy                                                           NO
590000  Voice   Circuit         Circuit Down    590000  Voice   No Dial Tone    No Dial Tone    NO
590744  Voice   Circuit         Circuit Down                                                        NO
588642  Voice   Can't Call Out  Cannot Call Out 588642  Voice   Can't Call Out  Cannot Call Out YES