1

I am working on a python script that reads data by tailing a file and then puts in a different file. The script works in a time bound manner and eventually flushes out the data from the buffer when the ENDTIME is reached. However there has been a mismatch in the source and target file in terms of size.

Following is a snippet:

    self.read_size = 2048
    self.tail_buffer = 2048

    # start the file tail
    cmd = '%s -f -o %s %s' % (self.jtailcmd, offset, self.source_journal)
    self.logger.debug('[%s] starting FILE Tail' % self.getName())
    try:
        self.jtail = popen2.Popen3(cmd, bufsize = self.tail_buffer)
        self.jtail.tochild.close()
        out = self.jtail.fromchild
        outfd = self.jtail.fromchild.fileno()
        flags = fcntl.fcntl(outfd, fcntl.F_GETFL)
        fcntl.fcntl(outfd, fcntl.F_SETFL, flags | os.O_NONBLOCK)
    except:
        message = '[%s] error reading file' % self.getName()
        self.logger.error(message)
        self.logger.error('[%s] %s: %s' % \
            (self.getName(), sys.exc_info()[0], sys.exc_info()[1]))
        send_alert('AE', message)
        self.sleep(60)
        self.close_tail()
        self.close_ssh()

And then eventually it flushes out the data:

        try:
            [i, o, e] = select.select([outfd], [], [], 1)
            if i:
                data = out.read(self.read_size)
            else:
                data = None
        except:
            message = '[%s] error reading file' % self.getName()
            self.logger.error(message)
            self.logger.error('[%s] %s: %s' % \
                (self.getName(), sys.exc_info()[0], sys.exc_info()[1]))
            send_alert('AE', message)
            self.close_tail()
            self.close_ssh()
            self.sleep(60)
            break
        if data:
            if self.sshcat.poll() != -1:
                self.logger.error('[%s] connection error' % self.getName())
                self.close_tail()
                self.close_ssh()
                break
            try:
                self.sshcat.tochild.writelines(data)
                self.sshcat.tochild.flush()
            except:
                message = '[%s] error writing remote file' % self.getName()

While troubleshooting, I narrowed out the problem to tail_buffer size! By reducing the tail_buffer size , the script worked fine.

I do not want to rely on tail_buffer size. Ideally the script should be independent of it!

Is there a way to flush data from the POPEN buffer ?

Please help!

user2475677
  • 149
  • 2
  • 5
  • 11
  • If you reduced `tail_buffer` and the code started to work fine, what is the remaining problem? – 9000 Jan 17 '14 at 15:14
  • I do not want to rely on tail_buffer size. Essentially the script should be independent of it! – user2475677 Jan 17 '14 at 15:19
  • Is there a more efficient way of reading the data from the file? – user2475677 Jan 21 '14 at 19:19
  • Have you tried setting the bufsize to 0 in this line: `self.jtail = popen2.Popen3(cmd, bufsize = self.tail_buffer)` ? The bufsize defaults to 0 if you do not specify it. It sounds like the command may be waiting for the buffer to be filled before it returns. That would explain why it returns faster when you use a smaller buffer. – Kevin Jan 23 '14 at 22:27

1 Answers1

0

You might consider taking a similar approach as this question if I understand your problem correctly: Reading from a frequently updated file

Using tail is a bit of a hack.

Community
  • 1
  • 1
cylus
  • 357
  • 1
  • 4
  • 14