0

I have a dataframe of Security event logs like below:

"machinename","eventid","entrytype","source","timegenerated","timewritten","username","message"
"MyMachineName","4656","failureaudit","microsoft-windows-security-auditing","3/7/2017 3:34:09 pm","3/7/2017 3:34:09 pm",,"a handle to an object was requested.    subject:   security id:  s-1-5-21-123456789-123456789-123456789-1381912   account name:  account.name.matt   account domain:  mydomain   logon id:  0x0d4d0d1d    object:   object server:  security   object type:  key   object name:  \registry\machine\system\controlset001\control\class\{1d36e972-f315-1111-b2d3-09112bf10211}\properties   handle id:  0x0    process information:   process id:  0x35f4   process name:  c:\windows\system32\wbem\wmiprvse.exe    access request information:   transaction id:  {00000000-0000-0000-0000-000000000000}   accesses:  %%1538      %%4432      %%4435      %%4436         access reasons:  -   access mask:  0x20019   privileges used for access check: -   restricted sid count: 0"
"MyMachineName","4688","successaudit","microsoft-windows-security-auditing","1/1/2011 3:34:09 pm","1/1/2011 3:34:09 pm",,"a new process has been created.    subject:   security id:  s-1-5-18   account name:  account.name.matt    account domain:  mydomain  logon id:  0x3e5    process information:   new process id:  0x1e98   new process name: c:\windows\system32\conhost.exe   token elevation type: %%1936   creator process id: 0x1b8   process command line:     token elevation type indicates the type of token that was assigned to the new process in accordance with user account control policy.    type 1 is a full token with no privileges removed or groups disabled.  a full token is only used if user account control is disabled or if the user is the built-in administrator account or a service account.    type 2 is an elevated token with no privileges removed or groups disabled.  an elevated token is used when user account control is enabled and the user chooses to start the program using run as administrator.  an elevated token is also used when an application is configured to always require administrative privilege or to always require maximum privilege, and the user is a member of the administrators group.    type 3 is a limited token with administrative privileges removed and administrative groups disabled.  the limited token is used when user account control is enabled, the application does not require administrative privilege, and the user does not choose to start the program using run as administrator."

How can I expand the 'Message' field and create a set of columns based on each key value pair keeping the sentences separate from the key:value pairs?

Essentially, I would like to take any event and transform it based on key value pairs in the message.

I have split the message like below but the output puts the items into a list. Not sure how I can add the proper columns.

print(security.message.str.split(":\\s"))

Any help would be great. Thanks.

johnnyb
  • 1,745
  • 3
  • 17
  • 47

1 Answers1

1

I really want to comment but can't so here it goes. Did you check your message is splitting correctly? Once you have a list of items you can use join,append, or concat to add additional columns. see Append column to pandas dataframe

Community
  • 1
  • 1
  • I have attempted to split multiple ways. The problem I see, is that I get a list of stuff splitted, but I need to specifically create a column header or expand to a column name based on the field that has a : before it. In the Security message field, there is many different formats. I was looking to see if anyone had specifically solved it. – johnnyb Mar 21 '17 at 03:36
  • I think then you need to split the values into a dictionary, check this out http://stackoverflow.com/q/186857/5729272 Once you have the key value pair, keys can become column names. – Sudhir Chauhan Mar 21 '17 at 03:47
  • 1
    @johnnyb Regex might have been a solution but I think your data does not lend itself for easy splitting. Most keys are two words but some are not so by capturing stuff automatically you will end up either capturing part of data as key or vice versa. – Sudhir Chauhan Mar 21 '17 at 06:24