3

If I have a bunch of csv files and they get updated periodically. Let's say the csv files are:

file1.csv, file2.csv file3.csv

During the updating process, the data is appended to the last line of the csv file.

is it possible to read the data from the csv file and as it updated and store it in a array or collection(deque).

Is there a way to collect the data from the csv file as it is updated?

Alex L
  • 8,748
  • 5
  • 49
  • 75
rnish
  • 165
  • 1
  • 4
  • 11

2 Answers2

2

You can use a python package called Watchdog.

This example shows monitoring the current directory recursively for file system changes, and logging any to the console:

import time
from watchdog.observers import Observer
from watchdog.events import LoggingEventHandler

if __name__ == "__main__":
    event_handler = LoggingEventHandler()
    observer = Observer()
    observer.schedule(event_handler, path='.', recursive=True)
    observer.start()
    try:
        while True:
            time.sleep(1)
    except KeyboardInterrupt:
        observer.stop()
    observer.join()

You could use this in conjunction with Ignacio's answer - use file_pointer.tell() to get the current position in the file, and then seek() there next time, and read the remainder of the file. For example:

# First time
with open('current.csv', 'r') as f:
    data = f.readlines()
    last_pos = f.tell() 

# Second time
with open('current.csv', 'r') as f:
    f.seek(last_pos)
    new_data = f.readlines()
    last_pos = f.tell()
Alex L
  • 8,748
  • 5
  • 49
  • 75
  • if the python script doesn't running during the csv updating process. I think it will be better to pickle `last_pos` into a harddisk file to mark last read line. – Shawn Zhang Feb 11 '13 at 03:02
0

Compare the current size of the file with the current offset within the file. If the size is greater, read the new data.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358