0

I want to be notified when a .txt file changes on the web, for example in a log file such as http://overrustlelogs.net/Cowsep%20chatlog/November%202015/2015-11-18.txt

I'm using urllib2 to retrieve the data, and prowlpy to send my phone notifications, but i'm not sure how to check if the text file has changed. (I want to be notified when there's a new line, or even when my name is mentioned)

Edit: I don't think getting the MD5 hash of it is a great way to go, I jut want to be notified of new lines in the text file. I'll probably have it loop every 10 seconds and send me a notification of the changes since the previous text file

Cherona
  • 758
  • 2
  • 10
  • 27
  • Possible duplicate of [Get MD5 hash of big files in Python](http://stackoverflow.com/questions/1131220/get-md5-hash-of-big-files-in-python) – Torxed Nov 18 '15 at 09:21
  • You get the hash sum of the file content, and you store the previous known hash somewhere (database, text file, pickle object file). – Torxed Nov 18 '15 at 09:22
  • Why do you believe using the hash is not a good solution? Each new version will get a different hash compared to the previous one. – DJanssens Nov 18 '15 at 09:28
  • As the others have suggested, do use the hash to check for any changes. If and only if there is some change you can check the contents of the file to send a specific notification. If the notification is generic, then there is no need to check the file. – Resley Rodrigues Nov 18 '15 at 09:30
  • @djanssens thats true but I want the new lines in the text file to be sent to my phone – Cherona Nov 18 '15 at 09:31
  • You could do it in 2 steps, check hash to see if changed each x minutes, if changed load and get last lines. – DJanssens Nov 18 '15 at 10:21

1 Answers1

3

You can use filecmp or difflib for comparing files, and can produce difference information in various formats. Here is a short example of diff:

s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n']
s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n']
for line in context_diff(s1, s2, fromfile='before.py', tofile='after.py'):
    sys.stdout.write(line)

Output:

*** before.py
--- after.py
***************
*** 1,4 ****
! bacon
! eggs
! ham
  guido
--- 1,4 ----
! python
! eggy
! hamster
  guido

I suggest that you do not loop every 10 seconds for this task. You can use celery for task scheduling every 10 seconds or so.

Saeed
  • 661
  • 6
  • 12