3

I have files that are updated every 2 hours. I have to detect the files automatically and insert the extracted information from them into a database.

Our DBMS is Postgresql and programming language is Python. How would you suggest I do that?

I want to make use of DAL (Database Abstraction Layer) to make connection between the files and database and use postgresql LISTEN/NOTIFY techniques to detect the new files. If you agree with me please tell me how I can use LISTEN/NOTIFY functions to detect the files.

Thank you

Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
f.ashouri
  • 5,409
  • 13
  • 44
  • 52
  • This question was also [posted on the pgsql-general mailing list](http://archives.postgresql.org/message-id/1345387783137-5720349.post@n5.nabble.com). @user1043898, please link back to a SO question when re-posting elsewhere, or at least mention you've re-posted. – Craig Ringer Aug 20 '12 at 05:52
  • Seems this question is actually a dup of http://stackoverflow.com/questions/2314892/python-daemon-to-watch-a-folder-and-update-a-database, I only just noticed. – Craig Ringer Aug 20 '12 at 11:52

2 Answers2

3

What you need is to write a script that stays running as a dæmon, using a file system notify API to run a callback function when the files change. When the script is notified that the files change it should connect to PostgreSQL and do the required work, then go back to sleep waiting for the next change.

The only truly cross platform way to watch a directory for changes is to use a delay loop to poll os.listdir and os.stat to check for new files and updated modification times. This is a waste of power and disk I/O; it also gets slow for big sets of files. If your OS reliably changes the directory modification time whenever files within the directory change you can just os.stat the directory in a delay-loop, which helps.

It's much better to use an operating system specific notification API. Were you using Java I'd tell you to use the NIO2 watch service, which handles all the platform specifics for you. It looks like Watchdog may offer something similar for Python, but I haven't needed to do directory change notification in my Python coding so I haven't tested it. If it doesn't work out you can use platform-specific techniques like inotify/dnotify for Linux, and the various watcher APIs for Windows.

See also:

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thank you Craig. It was very informative. If I opt for using platform-specific techniques, could I incorporate them in python environment or I should use command line of the operating system? – f.ashouri Aug 20 '12 at 09:24
  • @user1043898 You still haven't said what OS you're on, makes it harder to answer. **Please update your question with your OS**. You absolutely can use platform-specific directory watching facilities from Python, though it might take some hoop-jumping if they're only available directly as C APIs. A quick Google search found an inotify module for Python, and for Windows there's apparently http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html – Craig Ringer Aug 20 '12 at 09:28
  • My OS is windows at the moment but I could shift to Linux if it is necessary. – f.ashouri Aug 20 '12 at 11:41
  • @user1043898 Looks like there's some good info - for either platform, as it happens - in [this question](http://stackoverflow.com/questions/2314892/python-daemon-to-watch-a-folder-and-update-a-database). Between that and the links I've already given you I expect you'll be able to work it out. – Craig Ringer Aug 20 '12 at 11:46
1

You can't use LISTEN/NOTIFY because that can only send messages from within the database and your files obviously aren't in there.

You'll want to have your python script scan the directory the files are in and check their modification time (mtime). If they are updated, you'll need to read in the files, parse the data and insert it to the db. Without knowing the format of the files, there's no way to be more specific.

Richard Huxton
  • 21,516
  • 3
  • 39
  • 51
  • Thank you Richard. I actually know the file format and I know how to convert it to simple text files. The problem is detecting the newly updated files. Is it possible with psycopg or database abstraction layer (DAL) as database connectors to Python? – f.ashouri Aug 19 '12 at 14:44
  • It doesn't matter how you connect to PostgreSQL, that's not going to tell you anything about unrelated files. There might be a library to do it for you, but otherwise just scan the directory for files and check the timestamps. Sleep 60 seconds and repeat... – Richard Huxton Aug 19 '12 at 15:01
  • Thank you. do you have any idea how to scan for new files? – f.ashouri Aug 19 '12 at 15:08
  • Any introductory guide to Python should cover this sort of thing. If you don't already know a good guide will prove useful anyway. – Richard Huxton Aug 19 '12 at 15:42