1

I want to tail multiple files concurrently and push the logs to scribe. I am reading the files from a Config file then I want to tail each file and send the logs to scribe. What I have tried is sends log for only the first file and doesn't for the others.

I want to run the tailing concurrently for each file and send the logs for each one of them at same time.

for l in Config.items('files'):
  print l[0]
  print l[1]
  filename = l[1]
  file = open(filename,'r')
  st_results = os.stat(l[1])
  st_size = st_results[6]
  file.seek(st_size)
  while 1:
    where = file.tell()
    line = file.readline()
    if not line:
      time.sleep(1)
      file.seek(where)
    else:
      print line, # already has newline
      category=l[0]
      message=line
      log_entry = scribe.LogEntry(category, message)
      socket = TSocket.TSocket(host='localhost', port=1463)
      transport = TTransport.TFramedTransport(socket)
      protocol = TBinaryProtocol.TBinaryProtocol(trans=transport, strictRead=False, strictWrite=False)
      client = scribe.Client(iprot=protocol, oprot=protocol)
      transport.open()
      result = client.Log(messages=[log_entry])
      transport.close()
Shawn Chin
  • 84,080
  • 19
  • 162
  • 191
Rishabh
  • 3,752
  • 4
  • 47
  • 74
  • Your problem seems very similar to an example given towards the end of [this presentation about using python generators for system programming](http://www.dabeaz.com/generators-uk/GeneratorsUK.pdf). The whole presentation is really quite interesting, but in Part 7 the author gives an example of using threading and queues to tail multiple logs. This isn't a complete answer (sorry!) but you may want to take a look at it. – Peter Downs Dec 21 '11 at 12:55

2 Answers2

2

Try something like this (Inspired by this)

import threading

def monitor_file(l): 

    print l[0]
    print l[1]
    filename = l[1]
    file = open(filename,'r')
    st_results = os.stat(l[1])
    st_size = st_results[6]
    file.seek(st_size)
    while 1:
      where = file.tell()
      line = file.readline()
      if not line:
        time.sleep(1)
        file.seek(where)
      else:
        print line, # already has newline
        category=l[0]
        message=line
        log_entry = scribe.LogEntry(category, message)
        socket = TSocket.TSocket(host='localhost', port=1463)
        transport = TTransport.TFramedTransport(socket)
        protocol = TBinaryProtocol.TBinaryProtocol(trans=transport, strictRead=False,       strictWrite=False)
        client = scribe.Client(iprot=protocol, oprot=protocol)
        transport.open()
        result = client.Log(messages=[log_entry])
        transport.close()


for l in Config.items('files'):
    thread = threading.Thread(target=monitor_file, args=(l))
Pengman
  • 708
  • 6
  • 12
1

A different implementation of @Pengman's idea:

#!/usr/bin/env python
import os
import time
from threading import Thread

def follow(filename):
    with open(filename) as file:
        file.seek(0, os.SEEK_END) # goto EOF
        while True:
            for line in iter(file.readline, ''):
                yield line
            time.sleep(1)

def logtail(category, filename):
    print category
    print filename
    for line in follow(filename):
        print line,
        log_entry(category, line)

for args in Config.items('files'):
    Thread(target=logtail, args=args).start()

Where log_entry() is a copy of the code from the question:

def log_entry(category, message):
    entry = scribe.LogEntry(category, message)
    socket = TSocket.TSocket(host='localhost', port=1463)
    transport = TTransport.TFramedTransport(socket)
    protocol = TBinaryProtocol.TBinaryProtocol(trans=transport,strictRead=False,
                                               strictWrite=False)
    client = scribe.Client(iprot=protocol, oprot=protocol)
    transport.open()
    result = client.Log(messages=[entry])
    transport.close()

follow() could be implemented using FS monitoring tools, see tail -f in python with no time.sleep.

Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • using this it is sending only the first log entry to scribe and the subsequent log entry is shown but not sent to scribe – Rishabh Dec 21 '11 at 13:41
  • 1
    @Rishabh: Is `log_entry()` called concurrently for multiple files (add debug output if you are not sure)? Does `log_entry()` work if you call it in a loop from a single thread (`for i in range(10): log_entry('test', str(i))`)? multiple threads (replace `follow()` with a fake data to test it)? – jfs Dec 21 '11 at 22:52
  • when I tailed the logs which the scribe made they were all working fine – Rishabh Dec 22 '11 at 07:29