1

Possible Duplicate:
How might I remove duplicate lines from a file?

I have a file with duplicated records that I want to remove. This is what I have tried

import sys  

for line in sys.stdin:  
    line = line.rstrip()  
    line = line.split()  
    idlist = []   
    if idlist == []:  
        idlist = line[1]  
    else:  
    idlist.append(line[1])  
    print line[0], idlist  

#did not work

and this

for line in sys.stdin:  
    line = line.rstrip()  
    line = line.split()  
    lines_seen = set()  
    dup = line[1]  
    if dup not in lines_seen:  
        lines_seen = dup  
    else:  
        lines_seen.append(dup)  
    print line[0], lines_seen  
    
sys.stdin.close()

#did not work either!

This is what the input looks like

BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444

And this is what I want the output to look like

BLE 1234
BLE 1223
LLE 3456
BLE 4444
ELE 5555

Thanks! edg

Community
  • 1
  • 1
edg
  • 353
  • 1
  • 4
  • 11

3 Answers3

3
elem1_seen = set()                 # first initialize an empty set of seen elem[1]
lines_out = []                     # list of "unique" output lines
for line in sys.stdin:             # iterate over input
    elems = line.rstrip().split()  # split line into two elements
    if elems[1] not in elem1_seen: # if second element not seen before...
        lines_out.append(line)     # append the whole line to output
        elem1_seen.add(elems[1])   # add this second element to seen before set
print lines_out                    # print output
eumiro
  • 207,213
  • 34
  • 299
  • 261
0

The main issue is that you are changing variable types making a little confusion:

import sys  

for line in sys.stdin:  
    line = line.rstrip()   #Line is a string  
    line = line.split()    #Line is a list
    idlist = []            #idlist is a list
    if idlist == []:  
        idlist = line[1]   #id list is a string
    else:  
        idlist.append(line[1])  #and now?
    print line[0], idlist 
Don
  • 16,928
  • 12
  • 63
  • 101
  • I thought that if I say idlist = [] the idlist would be an empty *list*? (because a list is identified with square brackets). – edg Nov 25 '11 at 09:57
  • Yes, but when you say "idlist=line[1]" you are creating a new variable (a string in this case) that overrides the original definition – Don Nov 25 '11 at 09:59
  • wait, I thought I had changed the line to a list with line = line.split(), therefore I assumed that idlist = line[1] would be the first element in the *list* I had created...? – edg Nov 25 '11 at 10:10
  • At that point, line is a list but line[1] is the second element (a string) and not the first – Don Nov 25 '11 at 10:12
  • I thought line = line.split() would change the complete line into a list with two elements? – edg Nov 25 '11 at 11:06
  • Yes: line.split() gives a list – Don Nov 25 '11 at 11:20
0
import fileinput

ss = '''BLE 1234
BLE 1223
LLE 3456
ELE 1223
BLE 4444
ELE 5555
BLE 4444 
'''
with open('klmp.txt','w') as f:
    f.write(ss)





seen = []
for line in fileinput.input('klmp.txt',inplace=1):
    b = line.split()[1]
    if b not in seen:
        seen.append(b)
        print line.strip()

Searching with word 'fileinput' in SO, I found:

How to delete all blank lines in the file with the help of python?

Community
  • 1
  • 1
eyquem
  • 26,771
  • 7
  • 38
  • 46