2

Hi I have a bit of a vague question...

I wanted to construct a tool to search through log files and i wanted the following functionality:

1) Search through the log files until a given log line is found. 2) After finding 1) jump forward an unknown number of lines until a condition is met. At this point the data is used to do some computation. 3) After completing 2) I want to return to line found in 1) and proceed through the file.

Now I'm able to perform the 1) and 2) fairly easily just looping over each line:

for line in file

for 3) I was going to use something like file.seek(linenum) and continue to iterate over the lines. But is there a more efficient way for any of the above steps?

thanks

emza0114
  • 59
  • 1
  • 6

3 Answers3

1

For files this is easy enough to solve by using tell and seek:

o=open(myfile)
#read some lines
last_position= o.tell()
#read more lines
o.seek( last_position )
#read more lines again

Note that, unlike you refer in your question, seek does not take a line number. It takes a byte offset. For ASCII files, a byte offset is also the character offset, but that doesn't hold for most modern encodings.

There's no "more efficient" way of doing this, AFAIK. This is extremely efficient from the OS, memory, cpu and disk perspectives. It's a bit clumsy from a programming standpoint, but unfortunately python does not offer a standard way to clone iterators

loopbackbee
  • 21,962
  • 10
  • 62
  • 97
0
def read_until_condition(fd, condition, reset, apply=None):
    """
    Returns the position of the file in which the condition functuon is
    true
    :fd : a file descriptor
    :condition (function): a funtion that accepts a line
    :reset (bool): if True then the fd is returned to the initial position
    :apply (func): The function to apply to each line

    Returns:
    int the position of the file in which the condition is True
    """
    pos = None
    current_position = fd.tell()

    while True:
        pos = fd.tell()
        l = fd.readline()

       if l and apply is not None:
           apply(l)

       if not l or condition(l):
           break

    if reset:
        fd.seek(current_position)

    return pos


if __name__ == '__main__':

    f = open('access_log', 'r')
    cf = lambda l: l.startswith('64.242.88.10 - - [07/Mar/2004:16:54:55 -0800]')
    pos = read_until_condition(f, cf, False)
    condition = lambda l: l.startswith('lj1090.inktomisearch.com - - [07/Mar/2004:17:18:41 -0800]')

    def apply(l):
        print l,

    read_until_condition(f, condition, True, apply)

    f.close()

I do not know exactly what you need but something like the above (with your modifications according to your needs) should work.

I tested with some apache logs smaples donwloaded from here.

gosom
  • 1,299
  • 3
  • 16
  • 35
0

This answer implements an efficient line-based reader for huge files: https://stackoverflow.com/a/23646049/34088

Community
  • 1
  • 1
Aaron Digulla
  • 321,842
  • 108
  • 597
  • 820