0

Hello I am looking to trim a McAfee log file and remove all of the "is OK" and other reported instances that I am not interested in seeing. Before we used a shell script that took advantage of the -v option for grep, but now we are looking to write a python script that will work on both linux and windows. After a couple of attempts I was able to get a regex to work in an online regex builder, but I am having a difficult time implementing it into my script. Online REGEX Builder

Edit: I want to remove the "is OK", "is a broken", "is a block lines", and "file could not be opened" lines so then I am just left with a file of just the problems that I am interested in. Sort of of like of like this in shell:

grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1

I read in and search through the file here:

import re

f2 = open(outFilePath)
contents = f2.read()
print contents
p = re.compile("^((?!(is OK)|(file could not be opened)| (is a broken)|(is a block)))*$", re.MULTILINE | re.DOTALL)
m = p.findall(contents)
print len(m)
for iter in m:
    print iter
f2.close()

A sample of the file I am trying to search:

eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
/tmp/tmp.BQshVRSiBo ... is OK.
/tmp/keyring-F6vVGf/socket ... file could not be opened.
/tmp/keyring-F6vVGf/socket.ssh ... file could not be opened.
/tmp/keyring-F6vVGf/socket.pkcs11 ... file could not be opened.
/tmp/yum.log ... is OK.
/tmp/tmp.oW75zGUh4S ... is OK.
/tmp/.X11-unix/X0 ... file could not be opened.
/tmp/tmp.LCZ9Ji6OLs ... is OK.
/tmp/tmp.QdAt1TNQSH ... is OK.
/tmp/ks-script-MqIN9F ... is OK.
/tmp/tmp.mHXPvYeKjb/mcupgrade.conf ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uninstall-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/mcscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/install-uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/readme.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan_secure ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/signlic.txt ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/uvscan ... is OK.
/tmp/tmp.mHXPvYeKjb/uvscan/liblnxfv.so.4 ... is OK.

But am not getting the correct output. I have tried removing both the MULTILINE and DOTALL options as well and still do not get the correct response. Below is the output when running with DOTALL and MULTILINE.

9
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')
('', '', '', '', '')

Any help would be much appreciated!! Thanks!!

el_loza
  • 47
  • 2
  • 8
  • With `re.findall`, you extract all captured values. Try `re.compile(r'^(?:(?!\b(?:is OK|file could not be opened|is a broken|is a block)\b).)+$', re.DOTALL | re.MULTILINE)` with `re.findall` – Wiktor Stribiżew Apr 05 '16 at 21:32
  • Can you show what you target output is, I will solve it instantly. – C Panda Apr 05 '16 at 21:34
  • I guess I wasn't clear I want all the lines that do not end with "is OK" or "is a broken" kind of like if I were to run: grep -v "is OK" ${OUTDIR}/${OUTFILE} | grep -v "is a broken" | grep -v "file could not be opened" | grep -v "is a block" > ${OUTDIR}/${OUTFILE}.trimmed 2>&1 – el_loza Apr 05 '16 at 21:44
  • Wiktor I tried your solution and still no success... – el_loza Apr 05 '16 at 21:44
  • @el_loza try mine below. – Dominique Fortin Apr 05 '16 at 21:55

4 Answers4

2

Perhaps think simpler, line by line:

import re
import sys

pattern = re.compile(r"(is OK)|(file could not be opened)|(is a broken)|(is a block)")

with open(sys.argv[1]) as handle:
    for line in handle:
        if not pattern.search(line):
            sys.stdout.write(line)

Outputs:

eth0
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY
cdlane
  • 40,441
  • 5
  • 32
  • 81
  • I thought he said he wanted to remove these lines. He wants everything else. – C Panda Apr 05 '16 at 21:36
  • Yeah, I think you want `if pattern.search(line) is None` (or `if not pattern.search(line)`). – Blckknght Apr 05 '16 at 21:37
  • @CPanda, clearly, a trivial fix if the code is simple enough! – cdlane Apr 05 '16 at 21:38
  • well, if the target lines start somewhere and continue till the end of the file, all except the first match are a waste. You could just break out. Couldn't you? – C Panda Apr 05 '16 at 21:40
  • @cdlane I didn't get what you just said. – C Panda Apr 05 '16 at 21:42
  • @CPanda, I'm not sure I follow you but the pattern will do what you describe, it will succeed on the first alternation that matches so there is no 'break out' involved. It should only try them all on failure. – cdlane Apr 05 '16 at 21:47
  • @cdlane can you tell me one thing, as I dont know about McAfee log files, say you hit a line that matches. From that point onward do they they continue to the end of the file?? – C Panda Apr 05 '16 at 21:53
  • @CPanda, again I'm not sure I follow but this solution has nothing to do with McAfee. It simply walks the file line by line. If a line doesn't match the pattern, it prints it. That's all. Any given line has no effect on the others. – cdlane Apr 05 '16 at 22:02
  • @cdlane I understand what you are saying, and I never said what you are saying is incorrect. I am just asking - is the pattern intermittent or continues to the end of the file. – C Panda Apr 05 '16 at 22:10
  • @cdlane Thank You! It looks like this solution works for me! Now I can just redirect the output to a log file and be done with the script! – el_loza Apr 05 '16 at 22:19
0

Sometimes regexes are more complicated, but if you're really only looking for these patterns then I'd probably just try the simple approach:

terms = (
    'is OK',
    'file could not be opened',
    'is a broken',
    'is a block',
)

with open('/tmp/sample.log') as f:
    for line in f:
        if line.strip() and not any(term in line for term in terms):
            print(line, end='')

It might not be faster than the regex, but it's about as simple as it gets. Alternatively you could also use a slightly more strict approach:

terms = (
    'is a broken',
    'is a block',
)

with open('/tmp/samplelog.log') as f:
    for line in f:
        line = line.strip()
        if not line:
            continue
        elif line.endswith('is OK.'):
            continue
        elif line.endswith('file could not be opened.'):
            continue
        elif any(term in line for term in terms):
            continue
        print(line)

The approach I would take largely depends on who I expect to be using the script :)

Wayne Werner
  • 49,299
  • 29
  • 200
  • 290
0

Try this (and it's done in one line)

p = re.compile("^(?:[if](?!s OK|s a broken|s a block|ile could not be opened)|[^if])*$")

It means that if in a line you have an "i" or an "f" it cannot be followed the suffix mentioned or it's not an "i" or an "f" then it's ok. It repeats that for all the charaters in the line.

Edit: After testing at regex101.com, I found why it was not working. Here is the one line regex that will work.

p = re.compile("^(?:[^if\n]|[if](?!s OK|ile could not be openeds OK|s a broken|s a block|ile could not be opened))*$", re.MULTILINE)
Dominique Fortin
  • 2,212
  • 15
  • 20
  • I think it doesn't like the `(?:s OK|s a ...` I took it out. – Dominique Fortin Apr 05 '16 at 22:21
  • Still no success...The log file checks file by file so potentially all of the files could be some variant of "is OK" "is a broken" but if not, I would like to keep all of the lines that indicate a problem, but at the very least I would like to get the header lines at the beginning and footer lines at the end of the log – el_loza Apr 05 '16 at 22:31
  • I just tried `^([^i]|[i](?!s OK))+$` in a text editor that supports regex and it doesn't work. I'm intriged. – Dominique Fortin Apr 05 '16 at 22:57
0

I know it is too late to answer. But I see that no answer is a correct solution.

Your regex for this case is wrong. You have unnecessary additional groups, a period is missing "." Also, it will only match if "is OK|file could not be opened|is a broken" is at the beginning of the sentence.

"hello world is OK": does not match  
"is OK hello world": matches

In a reverse match just use Non-capturing group '(?:)' instead of Capturing group '()'. This is to not get an empty string.

If you want to remove the entire sentence, you can use the following expression:

 r"^(?!.*(?:is OK|is a broken|file could not be opened)).*"
"is OK. hello world": matches  
"hello world is OK.": matches  
"is Ok.": matches

If you want to remove the entire sentence but only the ones ending in "is OK.|File could not be opened.|Is a broken.", You can use the following expression:

r"^(?!.*(?:is OK|is a broken|file could not be opened)\.$).*"
"is OK. hello world" does not match  
"hello world is OK.": matches  
"is Ok.": matches

Remember to use Non-capturing group '(?:)' instead of Capturing group '()', otherwise you will get an empty string:

                #Capturing group
regex = r"^(?!.*(is OK|file could not be opened|is a broken|is a block)).*"
print(re.findall(regex,text,flags=re.MULTILINE))

output:

['', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '', '']

Use the join() function to get the full text

                #Non-capturing group
regex = r"^(?!.*(?:is OK|file could not be opened|is a broken|is a block)).*"
print("\n".join(re.findall(regex,text,flags=re.MULTILINE)))

output:

eth1
10.0.11.196
00:0C:29:AF:6A:A7
parameters passed to uvscan: --DRIVER /opt/McAfee/uvscan/datfiles/current --    ANALYZE --AFC=32 ATIME-PRESERVE --PLAD --RPTALL RPTOBJECTS SUMMARY --UNZIP -- RECURSIVE --SHOWCOMP --MIME --THREADS=4 /tmp
temp XML output is: /tmp/HIQZRq7t2R
McAfee VirusScan Command Line for Linux64 Version: 6.0.5.614
Copyright (C) 2014 McAfee, Inc.
(408) 988-3832 LICENSED COPY - April 03 2016

AV Engine version: 5700.7163 for Linux64.
Dat set version: 8124 created Apr 3 2016
Scanning for 670707 viruses, trojans and variants.


No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/ATIME-PRESERVE

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/RPTOBJECTS

No file or directory found matching /root/SVN/swd-lhn-build/trunk/utils/SUMMARY

Test it