Filter unique lines from a text file in Python

Question

I want to print the unique lines present within the text file.

For example: if the content of my text file is:

I want my Python program to print:

12474
54675
74564

I'm using Python 2.7.

It looks like you want us to write some code for you. While many users are willing to produce code for a coder in distress, they usually only help when the poster has already tried to solve the problem on their own. A good way to demonstrate this effort is to include the code you've written so far, example input (if there is any), the expected output, and the output you actually get (output, tracebacks, etc.). The more detail you provide, the more answers you are likely to receive. Check the [FAQ](http://stackoverflow.com/tour) and [How to Ask](http://stackoverflow.com/questions/how-to-ask). — TigerhawkT3, Jan 21 '17 at 20:44
@TigerhawkT3 I hope you don't mind I closed the question. I feel you wanted to provide an answer :) — Jean-François Fabre, Jan 21 '17 at 20:44
@Jean-FrançoisFabre - It does deserve closure, but I don't think that's an accurate dupe. This question wants entries with a count greater than one to be removed entirely. — TigerhawkT3, Jan 21 '17 at 20:46
You may also check: [How to return unique words from the text file using Python](http://stackoverflow.com/questions/22978602/how-to-return-unique-words-from-the-text-file-using-python) — Moinuddin Quadri, Jan 21 '17 at 20:46
Right! Do you want me to reopen it so you can close it with the proper original question? — Jean-François Fabre, Jan 21 '17 at 20:48
@Jean-FrançoisFabre : It's certainly not the duplicate of the linked question. — Eric Duminil, Jan 21 '17 at 20:53
okay, reopening. After that I cannot close anymore. Don't complain about duplicate answers :) — Jean-François Fabre, Jan 21 '17 at 20:57
@Jean-FrançoisFabre I think that's justified, since we have fun down there trying to solve the riddle with different approaches.. even though it probably defeats the purpose of giving the fish instead of the fishing rod — hansaplast, Jan 21 '17 at 21:10
Okay, I try to give the fish, but before OP tries to eat it, the fish explains the solution :) — Jean-François Fabre, Jan 21 '17 at 21:12

hansaplast · Answer 1 · 2017-01-21T21:05:06.687

2

try this:

from collections import OrderedDict

seen = OrderedDict()
for line in open('file.txt'):
    line = line.strip()
    seen[line] = seen.get(line, 0) + 1

print("\n".join([k for k,v in seen.items() if v == 1]))

prints

12474
54675
74564

Update: thanks to the comments below, this is even nicer:

from collections import Counter, OrderedDict

class OrderedCounter(Counter, OrderedDict):
    pass

with open('file.txt') as f:
    seen = OrderedCounter([line.strip() for line in f])
    print("\n".join([k for k,v in seen.items() if v == 1]))

edited Jan 21 '17 at 21:05

answered Jan 21 '17 at 20:44

hansaplast

11,007
2
61
75

indeed! didn't catch that, hold on – hansaplast Jan 21 '17 at 20:49
Yes Eric is right. This code isn't what i want – Altay Karakalpaklı Jan 21 '17 at 20:49
@AltayKarakalpaklı I updated the code so it now does what it should – hansaplast Jan 21 '17 at 20:56
@AltayKarakalpaklı the comments above are of course right: did you actually try anything before you posted the question? – hansaplast Jan 21 '17 at 20:56
1

@hansaplast : Your code works now, but you shouldn't write any code if the OP didn't bother to. Some text explaining what your method would do would have been better IMHO. – Eric Duminil Jan 21 '17 at 20:57
@EricDuminil: I will try to hold back next time. At least this question was formulated clearly which is somehow an exception on SO these days :) – hansaplast Jan 21 '17 at 20:58
Better way would have been to use `Counter` along with `OrderedDict` together to get order count of each word/line. No need of set here – Moinuddin Quadri Jan 21 '17 at 20:59
For reference, here's what I wrote before you updated the answer : You didn't provide any code, so you'll only get pointers. You could use a dictionary, with strings as keys and count as values. The default value would be 0. You could iterate over your file, and for every line, you increase the value of the corresponding string by 1. Once you read all the file, you can iterate over the values in your dictionary : if it's 1, you can output the key. – Eric Duminil Jan 21 '17 at 21:00
thank you it worked – Altay Karakalpaklı Jan 21 '17 at 21:01
@MoinuddinQuadri: I thought of combining `Counter` and `OrderedDict` but thought it would not be possible, turns out it is, this is a lot nicer of course, thanks for the tip – hansaplast Jan 21 '17 at 21:05

score 2 · Answer 2 · answered Jan 21 '17 at 21:06

2

You may use OrderedDict and Counter for removing the duplicates and maintaining order as:

from collections import OrderedDict, Counter

class OrderedCounter(Counter, OrderedDict):
    pass

with open('/tmp/hello.txt') as f:
    ordered_counter = OrderedCounter(f.readlines())

new_list = [k.strip() for k, v in ordered_counter.items() if v==1]
# ['12474', '54675', '74564']

answered Jan 21 '17 at 21:06

Moinuddin Quadri

46,825
13
96
126

I was 1 minute faster :-P – hansaplast Jan 21 '17 at 21:07
1

@hansaplast BTW I wrote it just to share with you on how to do it. I guess you already figured that out – Moinuddin Quadri Jan 21 '17 at 21:08
yeah, thanks for the pointers in the comments, it was a fun exercise – hansaplast Jan 21 '17 at 21:12
Interesting. What's the purpose of `pass` here? – Eric Duminil Jan 21 '17 at 21:13
1

@EricDuminil Since body of this class is empty, you need `pass` to complete the scope of class. Check : [How To Use The Pass Statement In Python](http://stackoverflow.com/questions/13886168/how-to-use-the-pass-statement-in-python) – Moinuddin Quadri Jan 21 '17 at 21:16

Trelzevir · Answer 3 · 2017-01-21T21:20:21.763

1

Use count() to check the number of occurrences of each element in the list, and remove each occurrence using index() in a for loop:

with open("file.txt","r")as f:
    data=f.readlines()
    for x in data:
        if data.count(x)>1:   #if item is a duplicate
            for i in range(data.count(x)):  
                data.pop(data.index(x))  #find indexes of duplicates, and remove them 
with open("file.txt","w")as f:
    f.write("".join(data)) #write data back to file as string

file.txt:

12474
54675
74564

edited Jan 21 '17 at 21:20

answered Jan 21 '17 at 20:58

Trelzevir

767
6
11

1

You could use `readlines()` directly, couldn't you? – Eric Duminil Jan 21 '17 at 21:11
Yes, thanks for suggestion – Trelzevir Jan 21 '17 at 21:21

score 0 · Answer 4 · answered Jan 21 '17 at 21:03

Not the most efficient since it uses count but simple:

with open("input.txt") as f:
    orig = list(f)
    filtered = [x for x in orig if orig.count(x)==1]

print("".join(filtered))

convert the file to a list of lines
create list comprehension: keep only lines occurring once
print the list (joining with empty string since linefeeds are still in the lines)

Filter unique lines from a text file in Python

4 Answers4