0

I'm writing lots of urls into a txt file like this inside my script with a loop:

fwrite = open('visited.txt', 'a')
fwrite.write('\n{0}'.format(url))
fwrite.close()

Then when i re-run later i don't want to process visited links so i do this:(visit is a list of new/old urls)

for x in visit:
    if x in open('visited.txt').read().lstrip('\r\n'):
        visit.remove(x)
    else:
        continue

But this always skips half of the lines. If there are 1000 urls, it removes only 500 of it. tried both lstrip/rtsrip with \n and \r\n but couldn't manage it

Nitin Prakash
  • 328
  • 3
  • 15
ggnoredo
  • 801
  • 1
  • 13
  • 33
  • 2
    "You are modifying the contents of the object `visit` that you are iterating over when you do `visit.remove(x)`" -- don't do that – chickity china chinese chicken Jan 24 '19 at 20:07
  • 1
    Also, you shouldn't open the visited.txt file, write a line and close it everytime you wish to add a url. Either use `with open('visited.txt', 'a') as f: f.write('{0}\n'.format(url))` or collect all the required urls in a list and write it once to the file. – Shirkan Jan 24 '19 at 20:12

2 Answers2

1

Read the lines only one time into a list:

with open('visited.txt', 'r') as f:
    visited = f.readlines()

If you wish to keep only the non visiting, you can convert both lists to sets and subtract one from another, then convert back to list:

non_visited = list(set(visit) - set(visited))
Shirkan
  • 859
  • 1
  • 9
  • 14
1

This is a duplicate of Python for loop skipping every other loop?, but for clarity here's a solution for this case:

with open('visited.txt') as f:
    visited = f.read().splitlines()

visit = [url for url in visit if url not in visited]

By the way, your first snippet is easier with context management, and I rearranged the \n since newlines are line terminators, not separators, especially on Unix-like OS's:

with open('visited.txt', 'a') as fwrite:
    fwrite.write('{0}\n'.format(url))
wjandrea
  • 28,235
  • 9
  • 60
  • 81