1

The full.txt contains:

www.example.com/a.jpg
www.example.com/b.jpg
www.example.com/k.jpg
www.example.com/n.jpg
www.example.com/x.jpg

The partial.txt contains:

a.jpg
k.jpg

Why the following code does not provide the desired result?

with open ('full.txt', 'r') as infile:
        lines_full=[line for line in infile]

with open ('partial.txt', 'r') as infile:
    lines_partial=[line for line in infile]    

with open ('remaining.txt', 'w') as outfile:
    for element in lines_full:
        if element[16:21] not in lines_partial: #element[16:21] means like a.jpg
            outfile.write (element)  

The desired remaining.txt should have those elements of full.txt that are not in partial.txt exactly as follows:

www.example.com/b.jpg
www.example.com/n.jpg
www.example.com/x.jpg
leanne
  • 111
  • 1
  • 8

2 Answers2

1

This code will include the newline character at the end of each line, which means it will never match "a.jpg" or "k.jpg" precisely.

with open ('partial.txt', 'r') as infile:
    lines_partial=[line for line in infile]

Change it to

with open ('partial.txt', 'r') as infile:
    lines_partial=[line[:-1] for line in infile]

to get rid of the newline characters (line[:-1] means "without the last character of the line")

David Robinson
  • 77,383
  • 16
  • 167
  • 187
  • 1
    this is rather dangerous to do, your last line may become "k.jp", for example. See this thread for more safe way to read lines from file - http://stackoverflow.com/questions/544921/best-method-for-reading-newline-delimited-files-in-python-and-discarding-the-new – Roman Pekar Sep 13 '13 at 06:11
1

you can use os.path library:

from os import path

with open ('full.txt', 'r') as f:
    lines_full = f.read().splitlines()

with open ('partial.txt', 'r') as f:
    lines_partial = set(f.read().splitlines())  # create set for faster checking

lines_new = [x + '\n' for x in lines_full if path.split(x)[1] not in lines_partial]

with open('remaining.txt', 'w') as f:
    f.writelines(lines_new)
Roman Pekar
  • 107,110
  • 28
  • 195
  • 197
  • your code provides superior speed than of mine, but the result are not separated by lines @Roman Pekar – leanne Sep 13 '13 at 06:22
  • 1
    @leanne yeah, missed that. " writelines() does not add line separators" - http://docs.python.org/2/library/stdtypes.html#file.writelines, so you have to add it manually – Roman Pekar Sep 13 '13 at 06:28