2

i have a file like:

q12j4
q12j4
fj45j
q12j4
fjmep
fj45j

now all i wanted to do is:

  • find if any entries are repeated,
  • if so then print the entry once and those are not repeated print 'em normally.
    the output should be like:

    q12j4  
    fj45j  
    fjmep  
    

    [repetition is omitted]

I was trying to do it with defaultdictfunction but I think it will not work for strings.
please help..

sehe
  • 374,641
  • 47
  • 450
  • 633
diffracteD
  • 758
  • 3
  • 10
  • 32

4 Answers4

3
def unique(seq):
    seen = set()
    for val in seq:
        if val not in seen:
            seen.add(val)
            yield val

with open('file.txt') as f:
    print ''.join(unique(f))

As you can see, I've chosen to write a separate generator for removing duplicates from an iterable. This generator, unique(), can be used in lots of other contexts too.

NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • @sehe: If you're subtly referring to your own answer, that answer doesn't produce the result that the OP is asking for. Besides, we are not playing code golf here. – NPE May 15 '12 at 12:52
  • @sehe I thought you were referring to `[x for x in L if x not in s and not s.add(x)]` but this code is more readable imo. – jamylak May 15 '12 at 12:54
  • @jamylak: That is very interesting, but I don't play codegolf :) I just meant like [in my answer](http://stackoverflow.com/a/10601038/85371). Readability first. Optimize --later-- never. – sehe May 15 '12 at 15:53
3

This should be roughly enough:

with open('file.txt', 'r') as f:
    for line in set(f):
        print line
sehe
  • 374,641
  • 47
  • 450
  • 633
  • 1
    This doesn't preserve the order of entries. – NPE May 15 '12 at 12:50
  • 1
    @aix - The OP does not explicitly mention preservation of order as one of his requirements. His question does mention `sort out` but that can easily be interpreted as `take out`. – dj18 May 15 '12 at 13:01
2
seen = set()
with open(filename, 'r') as f:
    for line in f:
        if line not in seen:
            print line
            seen.add(line)
eumiro
  • 207,213
  • 34
  • 299
  • 261
0

You should use the itertools.groupby function, for an example of usage, look at the standard library or this related question: How do I use Python's itertools.groupby()?

Assume that toorder is your list with repeated entries:

import itertools
toorder = ["a", "a", "b", "a", "b", "c"]

for key, group in itertools.groupby(sorted(toorder)):
    print key

Should output:

a
b
c
Community
  • 1
  • 1
pygabriel
  • 9,840
  • 4
  • 41
  • 54