1

I wrote a python script to run every 5 minutes using Task Scheduler, read a constantly growing log files (text files) and insert data into DB. New log file generated each day.

I need to modify it and put a pointer at the end of last line, so when the scheduler runs again, it starts after the last inserted line. Once a new day begins, pointer get back to the first line of the new file. Seek function would do it but couldn't figure out how yet. Here is my try:

import time, os
day=time.strftime("%Y%m%d")
month=time.strftime("%m")
filename=time.strftime("%Y%m%d")

# Check for a new day
currTime = datetime.datetime.now()
lastDay = 0


#Open file in a relative location
logs_dir = os.path.dirname(r'C:\Python27\Logs\\') 
rel_path = os.path.join('\\', month, filename + '.log')
abs_file_path = os.path.join(logs_dir, month, filename) + '.log'
file = open(abs_file_path, 'r')


if currTime.day != lastDay:
  lastDay = currTime.day
  file.seek(first_byte_to_read) #<<-- to reset the pointer ??
else:
  file.seek(last_read_byte) 
Shad
  • 205
  • 4
  • 11
  • 2
    Easier: remember the current length of file somewhere else, then resume from there next time. Inserting into the log is messy. – Amadan Dec 24 '14 at 04:43
  • 1
    Your example doesn't show any log file processing or a mechanism for storing and retrieving file offsets over multiple runs. But the general idea is to process the file and then call `file.tell()` to get its current file position. Save that somewhere and later, you can open the file, `file.seek(the_saved_position, 0)` and continue. – tdelaney Dec 24 '14 at 05:49
  • 1
    You can see if anything needs processing by seeking to the end of the file `file.seek(0, 2)` and checking `if file.tell() > the_saved_position`. – tdelaney Dec 24 '14 at 05:51

1 Answers1

2

Instead of repeatedly running the program and remembering where you left off, you can simply run the program once and have it monitor the file for new content. There are two main ways to do this:

  1. Polling. Read until end-of-file, then wait for a few seconds and try again. Simple, reliable, but not a great idea on power-constrained devices.
  2. Async. On Linux you could use PyInotify to be woken up when new content is available in the file. It seems like you're on Windows though, for which, see here: How do I watch a file for changes? A bit more complex, but generally a better solution.
Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436