0

I have a simple Python script that's downloading dated files from a server. The script reads a log file to see if the file has been downloaded or not and then makes a decision to download the file or skip it.

If the file is not in the log file (meaning it has not been downloaded yet), then it downloads the file and writes the file name to the log. This is so when the script runs again, it doesn't download the file again.

How I check to see if the file exists in the log is by using

f = open('testfile.txt', 'r+')
for line in f:
    if line.rstrip() == mysales + date + file:
        mysalesdownload = "TRUE"
    elif line.rstrip() == myproducts + date + file:
        myproductsdownload = "TRUE"
    else:
        continue

mysales + date + file would look like - mysales_2014-05-01.txt in the log file.

The problem now is I want to add a delimiter (;) and a downloaded date to the file. The downloaded date tells me when the script downloaded the data.

 f.write( mysales + date + file + ";" + datetime.date.today() + "\n");

However, this throws a wrench into my reading of the log file now. Dates are dynamic and the data does run over night. So, bearing in mind the line will now look like this:

   mysales_2014-05-01.txt;2014-05-02 

How do I only read up to the semicolon so I don't download the same file twice if the script runs over night?

Fastidious
  • 1,249
  • 4
  • 25
  • 43

2 Answers2

1

Change this line:

if line.rstrip() == mysales + date + file:

To:

if line.rstrip()[:line.rstrip().find(';')] == mysales + date + file:

And so on.

nwalsh
  • 471
  • 2
  • 9
0

If you simply want to see if your file is in the log file you can use the in conditional:

current_sales = '{}{}{}'.format(my_sales, date, file)
current_products = '{}{}{}'.format(my_products, date, file)
with open('test_file.txt', 'r+') as file:
    for line in file:
        if current_file in line:
            my_sales_download = 'TRUE'
        elif current_products in line:
            my_products_download = 'TRUE'

Your final else statement is redundant and can be taken out. Also, I think it would be prudent to check if both my_sales_download and my_products_download are TRUE. If this is the case you can probably break from the for loop.

BeetDemGuise
  • 954
  • 7
  • 11