0

I'm attempting to read and parse a .txt file that is continually being updated throughout the day. I want to parse only lines that have not already been consumed. These are then to be sent to a Telegram group.

At present, every time I run the script it parses everything.

selections = []
msgList = []
urr = ""
name = ""
ourLines=len(selections)



while(True): 
    file1 = open(r'C:\\urlt\log.txt', 'r')
    Lines = file1.readlines()
    file1.close()
    try:
        while(True): 
            if(ourLines==len(Lines)): 
                break 
            else:
                txt = Lines[ourLines].strip() 
                tlist = txt.split("&") 
                ourLines=ourLines+1 
                for subtxt in tlist: 
                    if "eventurl=" in subtxt: 
                        a = subtxt[9:len(subtxt) - 3] 
                        url = "www.beefandtuna.com/%23"+a.replace("%23", "/").strip('(')
                        #print(url) 
                        urr = url 
                    elif "bet=" in subtxt: 
                        name = urllib.parse.unquote(subtxt[4:len(subtxt)]) 
                        #print(name)
                        selections.append(url+name) 
                        msg = url +" " '\n' "Name: "+ name
                        if msg not in msgList:
                            post_to_telegram(msg)
                            msgList.append(msg)
        #time.sleep(0.5)
    except:
        pass
Graham
  • 107
  • 1
  • 8

1 Answers1

1

Assuming the new contents are appended to the end of the file: after you finish reading the file, create a copy of the file.

The next time you read the file, seek to the location that is the length of the copy.

import os
from shutil import copyfile

in_file_loc = r'C:\\SmartBet.io Bot\placerlog.txt'
backup_file_loc = in_file_loc + ".bak"

while True:
    try: 
        file_backup_size = os.stat(backup_file_loc).st_size
    except: 
        file_backup_size = 0
    file1 = open(in_file_loc, 'r')
    
    # move file position to the end of the old file
    file1.seek(file_backup_size)

    # Read all lines in the file after the position we seek-ed to
    Lines = file1.readlines()
    file1.close()

    # copy current version of file to backup
    copyfile(in_file_loc, backup_file_loc)

    # Then do whatever you want to do with Lines

This is probably not the best way to do this because, as rici said in a comment below:

"make a copy" is not an atomic operation, and as the file grows copying will be successively slower. Any data appended to the log file during the copy will never be reported. Furthermore, the copy might happen to include a partial entry, in which case the next scan will start in the middle of an entry.

An alternative is to save the size of the current file in a different one:

in_file_loc = r'C:\\SmartBet.io Bot\placerlog.txt'
size_file_loc = in_file_loc + ".lastsize"

while True:
    # read old size from file
    try:
        with open(size_file_loc, 'r') as f:
            file_size = int(f.read())
    except:
        # if error, file size is zero
        file_size = 0

    file1 = open(in_file_loc, 'r')
    file1.seek(file_size)
    Lines = file1.readlines()
    new_file_size = file1.tell() # Get the location of the current file marker
    file1.close()

    # write new size to file
    with open(size_file_loc, 'w') as f:
        f.write(str(new_file_size))

    # Then do whatever you want to do with Lines
Pranav Hosangadi
  • 23,755
  • 7
  • 44
  • 70
  • Thank you Pranav, but either I'm not following where from your "# do everything else" where I should paste in my original code. I've added both from while(True): if(ourLines==len(Lines)): as well as putting in the while(True): file1 = open(r'C:\\urlt\log.txt', 'r') both to no avail. If I open the txt file and add a new line, nothing gets parsed in real-time. But if I ctrl+C in the terminal (spyder) I can see that both file sizes are being updated accordingly. – Graham Apr 09 '21 at 20:40
  • I assume the bit of your code after you `file1.close()` is where you "do everything else" that's not related to reading the file? @Graham – Pranav Hosangadi Apr 09 '21 at 20:42
  • In other words, `Lines` now has the contents of the file _after_ the position in the previous iteration. So after the `# Then do whatever you want to do with Lines`, you do whatever you want with `Lines`! – Pranav Hosangadi Apr 09 '21 at 20:45
  • 1
    "make a copy" is not an atomic operation, and as the file grows copying will be successively slower. Any data appended to the log file during the copy will never be reported. Furthermore, the copy might happen to include a partial entry, in which case the next scan will start in the middle of an entry. Your second solution is better, but instead of stat"ing the file after closing it, you should get the current read pointer using `tell`. That measures where you actually read to, and not the size of a possibly different file at a possibly later time. – rici Apr 10 '21 at 00:49
  • Please read about [TOCTTOU](https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use) race conditions for more information. – rici Apr 10 '21 at 00:52
  • Thanks @rici I have added your suggestion to the answer – Pranav Hosangadi Apr 12 '21 at 14:43
  • I can process updates, but they have to be done manually. I.e, opening the txt file and adding a new line, hitting save. This defeats the purpose. I want to track and parse automatic updates to the log file. Also, while the script is running (and not parsing anything) if I kill the scrip and open the log file I can see new lines that weren't there previously, but they just weren't picked up by the script. – Graham Apr 14 '21 at 23:19