0

Say, I have a collection of text files I need to process (e.g. search for a certain label and extract the value). What would be the general way to tackle the problem?

I also read this: "Retrieve Variable Values from Python" but it seems not applicable to some of the cases I face (like tab is used instead of :)

I just want to know the most appropriate way to tackle the problem regardless of the language used.

Say I have something like:

Name: Backup Operators  SID: S-1-5-32-551   Caption: COMMSVR21\Backup Operators Description: Backup Operators can override security restrictions for the sole purpose of backing up or restoring files  Domain: COMMSVR21   
COMMERCE/cabackup
COMMSVR21/sys5erv1c3

I want to be able to access/retrieve the values of Backup Operators and get COMMERCE/cabackup & COMMSVR21/sys5erv1c3 in return.

How would you do it?

What I thought of is to read the whole text file, regex search and probably some if else statements. Is this effective? Or maybe parsing the text file into probably some array and retrieve it? I'm not sure.

Like in another example say:

        GPO: xxx & yyy Servers
            Policy:            MaximumPasswordAge
            Computer Setting:  45

How would you check the text file for Policy = MaximumPasswordAge and return the value 45?

Thanks!

p/s -- I might be doing this in Python (zero knowledge, so picking it up on the fly) or Java

pp/s -- I just realised that there's no spoiler tag. Hmm

--

E.g. of the logs: Log with Directory Permissions:

C:\:
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Folders
    BUILTIN\Users   Allowed:    Special Permissions: 
            Create Files
    \Everyone   Allowed:    Read & Execute
    (No auditing)

C:\WINDOWS:
    BUILTIN\Users   Allowed:    Read & Execute
    BUILTIN\Power Users Allowed:    Modify
    BUILTIN\Power Users Allowed:    Special Permissions: 
            Delete
    BUILTIN\Administrators  Allowed:    Full Control
    NT AUTHORITY\SYSTEM Allowed:    Full Control
    (No auditing)

Another one with the following:

    Audit Policy
    ------------
        GPO: xxx & yyy Servers
            Policy:            AuditPolicyChange
            Computer Setting:  Success

        GPO: xxx & yyy Servers
            Policy:            AuditPrivilegeUse
            Computer Setting:  Failure

        GPO: xxx & yyy Servers
            Policy:            AuditDSAccess
            Computer Setting:  No Auditing

This is the tab delimited one:

User Name   Full Name   Description Account Type    SID Domain  PasswordIsChangeable    PasswordExpires PasswordRequired    AccountDisabled AccountLocked   Last Login
53cuR1ty        Built-in account for administering the computer/domain  512 S-1-5-21-2431866339-2595301809-2847141052-500   COMMSVR21   True    False   True    False   False   09/11/2010 7:14:27 PM
ASPNET  ASP.NET Machine Account Account used for running the ASP.NET worker process (aspnet_wp.exe) 512 
Community
  • 1
  • 1
Alex Cheng
  • 691
  • 2
  • 9
  • 21
  • If you are free to decide the syntax of the input file, you could write it as plain Python code! – Vijay Mathew Jan 06 '11 at 03:53
  • Heh, that would be nice. Or to make it even more fun, Lisp ;) – Blender Jan 06 '11 at 03:54
  • @Vijay Mathew: Hi. What do you mean by that? Can you please rephrase? If I get you correctly, the input files are always of the same formatting. @Blender: Oh god Lisp. – Alex Cheng Jan 06 '11 at 03:58
  • This is a job for PyParsing. Add the 'pyparsing' tag, its author (Paul McGuire, great guy) pops out of nowhere and solves all your parsing troubles. – TryPyPy Jan 06 '11 at 04:26

1 Answers1

1

I always shove Python into people's faces ;)

I recommend looking at Regex: http://docs.python.org/howto/regex.html, as it might fit your needs. I won't do it for you (because I can't), but I know this will work if your files are colon-delimited key/value pairs separated by newline characters. Here's a quick start (which might work):

regex = '(.*):( *)(.*)\n'

This matches three groups (hopefully): A group before the colon (group 1), the spaces (group 2, which can be thrown away), and the text between that and a new line (group 3).

Play with that (I don't want to have a regex aneurysm, so this is far as I can help for now). Good luck!

Blender
  • 289,723
  • 53
  • 439
  • 496
  • @Blender: So you're saying that I should parse the whole text file, and then filter the values I want using `regex` is it? Well, indeed, I foresee regex aneurysm for me as well XD Thanks – Alex Cheng Jan 06 '11 at 03:58
  • If its syntax is consistent, then sure. If not, things might get really ugly. Could you upload/post a bigger sample chunk? I could try to write a sample script... – Blender Jan 06 '11 at 04:02
  • Thanks. So, what are you trying to extract? If it's not too much, could your write up what you want to come out of each dataset. I'm not sure how to read it myself (Linux user)... – Blender Jan 06 '11 at 04:22
  • Oh erm, say for `Log with Directory Permissions`, I want to show the results if the drive have special permissions. `Like: C:\WINDOWS: BUILTIN\Power Users Allowed: Special Permissions: Delete` – Alex Cheng Jan 06 '11 at 05:03
  • Then for like `AuditPolicyChange` if it is set to `failure` then prompt in the console/write to new output file. Something like that. Basically I want to get the values of certain parts in the txtfile, not all, and compare them with preset values. That kind. – Alex Cheng Jan 06 '11 at 05:05