1

I am using Pandas to filter authentication logs. I want to select only those clients (identified by column name 'CALLING_STATION_ID') which have only one type of authorization profile (condition df['\'SELECTED_AZN_PROFILES\'']=='\'AD_Access\'').

Since a client may repeat authentication many times, I am using Pandas GroupBy function (df.groupby('\'CALLING_STATION_ID\'')). When I iterate over GroupBy I expect the keys to be unique, however I find that the keys are repeating. How to get unique keys that match a fixed criteria? GroupBy is not the way to go?

Here is the sample data:

'LOGGED AT','ENDPOINTMATCHEDPROFILE','IDENTITY_STORE','CTS_SECURITY_GROUP','NAS_PORT_ID','SELECTED_AZN_PROFILES','SERVER','RESPONSE_TIME','AUDIT_SESSION_ID','ID','MESSAGETEXT','SERVICE_TYPE','FAILURE_REASON','CREDENTIAL_CHECK','IDENTITY_GROUP','NETWORK_DEVICE_NAME','CALLING_STATION_ID','AUTHENTICATION_PROTOCOL','LOCATION','PASSED','AUTHENTICATION_METHOD','USER_NAME','DEVICE_TYPE','EXECUTION_STEPS','NAS_IP_ADDRESS'
'2016-08-30 16:23:08.35','HP-Device','AD1',,'GigabitEthernet6/0/12','AD_Access','ISE','17','0A6301640007425BAF23436B','1471419483080056','Authentication succeeded','Framed',,'MSCHAPV2','Profiled','SwitchA','StationA','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationA','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-29 23:44:24.254','Workstation','AD1',,'GigabitEthernet8/0/30','AD_Access','ISE','17','0A63016400073C105F18AF70','1471419482619976','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchA','StationB','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationB','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-27 02:16:16.327','Workstation','AD1',,'GigabitEthernet8/0/31','AD_Access','ISE','16','0A630164000740DD98214677','1471419481030819','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchA','StationC','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationC','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-29 11:51:22.83','Workstation','AD1',,'GigabitEthernet4/0/23','AD_Access','ISE','16','0A63016E000A49E237AF288C','1471419482276506','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchB','StationD','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationD','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-28 02:13:21.3','Workstation','AD1',,'GigabitEthernet4/0/23','AD_Access','ISE','15','0A63016E000A49E237AF288C','1471419481444736','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchB','StationD','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationD','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-26 22:55:19.754','Workstation','AD1',,'GigabitEthernet4/0/23','AD_Access','ISE','16','0A63016E000A49E237AF288C','1471419480981867','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchB','StationD','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationD','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-25 10:22:17.88','Workstation','AD1',,'GigabitEthernet4/0/23','AD_Access','ISE','16','0A63016E000A49E237AF288C','1471419480335106','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchB','StationD','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationD','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'
'2016-08-24 04:04:46.554','Workstation','AD1',,'GigabitEthernet4/0/23','AD_Access','ISE','16','0A63016E000A49E237AF288C','1471419479854979','Authentication succeeded','Framed',,'MSCHAPV2','Workstation','SwitchB','StationD','PEAP (EAP-MSCHAPv2)','All Locations#Hotel','1','dot1x','StationD','All Device Types',"11001,11017,15049,15008,15048,15048,15048,15048,15048,15004,11507,12500,12625,11006,11001,11018,12301,12300,12625,11006,11001,11018,12302,12318,12800,12805,12806,12807,12810,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12305,11006,11001,11018,12304,12318,12812,12813,12804,12801,12802,12816,12310,12305,11006,11001,11018,12304,12313,11521,12305,11006,11001,11018,12304,11522,11806,12305,11006,11001,11018,12304,11808,15041,15048,15048,15048,15006,15013,24431,24325,24313,24319,24323,24343,24470,22037,11824,12305,11006,11001,11018,12304,11810,11814,11519,12314,12305,11006,11001,11018,12304,15036,15048,24433,24355,24435,15048,15004,15016,11022,12306,11503,11002",'1.1.1.1'

Here is the code that I am using:

import pandas as pd
import numpy as np
from pandas import DataFrame, read_csv

uniq = pd.Series.unique

def only_machine(g):
    return (len(uniq(g['\'SELECTED_AZN_PROFILES\''])) == 1 ) and uniq(g['\'SELECTED_AZN_PROFILES\''])[0] == '\'AD_Access\''

mach_auth         = gb.filter(lambda g: only_machine(g)
        )[['\'NETWORK_DEVICE_NAME\'','\'USER_NAME\'','\'CALLING_STATION_ID\'','\'SELECTED_AZN_PROFILES\'']]

And here is the result:

>>> df = pd.read_clipboard(sep=',')
>>> gb = df.groupby('\'CALLING_STATION_ID\'')

>>> mach_auth         = gb.filter(lambda g: only_machine(g)                                                     
...         )[['\'NETWORK_DEVICE_NAME\'','\'USER_NAME\'','\'CALLING_STATION_ID\'','\'SELECTED_AZN_PROFILES\'']] 
>>> mach_auth                                                                                                   
  'NETWORK_DEVICE_NAME' 'USER_NAME' 'CALLING_STATION_ID'  \                                                     
0             'SwitchA'  'StationA'           'StationA'                                                        
1             'SwitchA'  'StationB'           'StationB'                                                        
2             'SwitchA'  'StationC'           'StationC'                                                        
3             'SwitchB'  'StationD'           'StationD'                                                        
4             'SwitchB'  'StationD'           'StationD'                                                        
5             'SwitchB'  'StationD'           'StationD'                                                        
6             'SwitchB'  'StationD'           'StationD'                                                        
7             'SwitchB'  'StationD'           'StationD'                                                        

  'SELECTED_AZN_PROFILES'                                                                                       
0             'AD_Access'                                                                                       
1             'AD_Access'                                                                                       
2             'AD_Access'                                                                                       
3             'AD_Access'                                                                                       
4             'AD_Access'                                                                                       
5             'AD_Access'                                                                                       
6             'AD_Access'                                                                                       
7             'AD_Access'                                                                  

While I expect this:

>>> mach_auth
  'NETWORK_DEVICE_NAME' 'USER_NAME' 'CALLING_STATION_ID'  \
0             'SwitchA'  'StationA'           'StationA'
1             'SwitchA'  'StationB'           'StationB'
2             'SwitchA'  'StationC'           'StationC'
3             'SwitchB'  'StationD'           'StationD'

  'SELECTED_AZN_PROFILES'
0             'AD_Access'
1             'AD_Access'
2             'AD_Access'
3             'AD_Access'
Benny
  • 639
  • 3
  • 11
  • 25
  • Why the backslashes everywhere? – IanS Aug 31 '16 at 10:15
  • The each value in the input CSV is enclosed with ' character. backslash is used as escape character to match ' as literal. – Benny Aug 31 '16 at 10:27
  • `SELECTED_AZN_PROFILES` column is missing. – jezrael Aug 31 '16 at 10:29
  • Should have guessed! Are you sure there is no trailing space, e.g. `"'StationA' "` vs. `"'StationA'"`? – IanS Aug 31 '16 at 10:29
  • @jezrael, sorry I missed it while copy pasting. It was generated but wrapped around in the bottom section of terminal. Is it necessary to get the full picture? – Benny Aug 31 '16 at 10:41
  • @IanS Yes I checked the key values are identifcal. At present I don't know whether it is expected output, I am wondering whether GroupBy is not the correct way to arrive at unique "Calling station ID's". – Benny Aug 31 '16 at 10:41
  • 1
    I think the best is create nice sample - see [How to make good reproducible pandas examples](http://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples). Or at least 5-6 rows of sample and desired output. – jezrael Aug 31 '16 at 10:43
  • And column `SELECTED_AZN_PROFILES` is important, because using it in filter. – jezrael Aug 31 '16 at 10:44
  • +1 for pointing towards the right way to ask question. I guess my question falls under the Ugly category despite my best efforts. Also I realize editing out some information in the question is presumptuous on my part that I know where the problem "is not". – Benny Aug 31 '16 at 11:47

0 Answers0