1

I have a log file with multiline events containing elements I need to capture then recursively search files for strings in log file and write to csv. Currently I am doing this with multiple bash commands, while it works, it is ugly. The error log file can contain 10s of thousands of lines with hundreds of CRITICAL errors.

log file (error.log)

    INFO ..some text.. title: (capture this title in capture group - title1)
    INFO ..some text.. path: (capture this url in capture group - url1)
    INFO ..some text..
    INFO ..some text.. version: (capture version in capture group - version1)
    INFO ..some text..
    INFO ..some text..
CRITICAL ..some text.. file/path (capture path (not file) in capture group - fp1) reason (capture reason in capture group - reason1)

Recursively search files ending in *.foo123 for any match of capture group file/path. Get elements from file path of recursive search. /some/path/(capture this - fp2)/(capture this - fp3)/(capture filename.foo123 - fname) If fp1 exists in any *.foo123 file print to csv format fp2,fp3,fname,title1,version1,reason1,url1

Complete noob so please be gentle. My google foo trying to munge things together is a complete fail

I wrote fp1 to unsupported.txt (grepping w/regex error.log) each value on a separate line

import os
ba = open('unsupported.txt', 'r')
ba1 = ba.readlines()

for folder, dirs, files in os.walk(rootdir):
    for file in files:
        if file.endswith('.foo123'):
            fullpath = os.path.join(folder, file)
            with open(fullpath, 'r') as f:
                for line in f:
                    if any(ext in ba1 for ext in line):
                        print(line)

This returns nothing. It looks like ba1 is captured as an array. If I change if any(ext in ba1 for ext in line): to an actual value if any(ext in "bad_value" for ext in line): I get a print of the contents of every file that has a match of "bad_value". If I can't get this far, I certainly cannot do anything I want to accomplish.

I have tried various other options from examples I have seen when searching, just not getting where I need to be.

As a bonus, pointing me to some reading material for the tasks I'm trying to accomplish would be nice.

1 Answers1

0

A great place to start would be to set up a debugger if you haven't already.

Your problem statement is a little unclear, but I would break it down into steps. Are you looping through the files the way you intend to? What is it you're trying to do to each line? You mentioned wanting to do some checking with regex, this tool is a useful place to experiment with those. You also mentioned wanting to add some data to a csv. Numpy is a useful library with functionality to write to csv

justin
  • 136
  • 5
  • Thanks Justin. The flow would be like this: read file `error.log` across a multiline error, capture elements title, path, version, file path and reason. Currently, accomplished via regex in bash. Once captured (an array?) recursively search all files with extension *.foo123 for string file/path (last word after "/") also capturing the directory path of the file containing the string being searched. Once captured, output captured elements to csv in specified order, from first (array?), results from 2nd (array?), remaining results of 1st (array?) Hope this helps with clarity. – myCodeIsJanky Dec 09 '22 at 22:28