3

I have this piece of python code, that loops thru a list of urls in a text file(urls.txt) then follows redirects of all urls and if the url contains a specific string, it writes it to a file called redirects.txt

import urllib.request
import ssl
redf = open('redirect.txt', 'w')
with open('urls.txt') as f:
   for row in f:
    #try:

      context = ssl._create_unverified_context()
      finalurl = ''
      try:
        res      = urllib.request.urlopen(row, context=context, timeout=10)
        finalurl = res.geturl().strip()
      except:
          #remove from list
          print("error:"+finalurl)

      # filedata = file.read()
      if finalurl.strip():
        if "/admin/" in finalurl:
            redf.write(finalurl+"\n");

The problem is that I have to wait for the entire URS to be processed before the redirect.txt file is created.

How can I write in real time?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Born vs. Me
  • 43
  • 2
  • 6
  • I'm just guessing here, but there may be no efficient way to do this. I only say so because in any OS a file has to be `close`d or saved before changes can be read. – KuboMD Jan 10 '19 at 18:01

2 Answers2

7

The file is created, but since your output is small, it's likely that it's all stuck in the write buffer until the file is closed. If you need the file to be filled in more promptly, either open it in line buffered mode by passing buffering=1:

open('redirect.txt', 'w', buffering=1)

or flush after each write, either by explicitly calling flush:

redf.write(finalurl+"\n")
redf.flush()

or, since you're adding newlines anyway so you may as well let it work for you, by using print with flush=True:

print(finalurl, file=redf, flush=True)

Side-note: You really want to use with statements with files opened for write in particular, but you only used it for the file being read (where it's less critical, since the worst case is just a delayed handle close, not lost writes). Otherwise exceptions can lead to arbitrary delaying in the file being flushed/closed. Just combine the two opens into one with, e.g.:

with open('urls.txt') as f, open('redirect.txt', 'w', buffering=1) as redf:
ShadowRanger
  • 143,180
  • 12
  • 188
  • 271
0

You could append to the redirect file, rather than keeping it open for the duration of your program.

import urllib.request
import ssl

def append(line):
    with open('redirect.txt', 'a') as redf:
        redf.write(line)

with open('urls.txt') as f:
   for row in f:

      ...

      if finalurl.strip():
        if "/admin/" in finalurl:
            append(finalurl)

Depending on any other interaction with the file whilst it's being processed, you may need to add a try/except mechanism to re-try in the append function.

richaux
  • 2,622
  • 2
  • 35
  • 40
  • 2
    Repeatedly opening, writing, and closing a file is a fairly high overhead operation; I'd discourage this approach unless the writes occurred unpredictably (e.g. driven by user input or the like in many parts of a complex application). When all the writes occur in a defined time frame, don't constantly open and close the file. – ShadowRanger Jan 10 '19 at 22:08
  • 2
    I agree: your buffer/flush mechanism looks more appropriate. – richaux Jan 11 '19 at 13:22