How to get all unique characters in a textfile? unix/python

Question

Is there any other way of getting a list of all unique characters in a textfile?

I've tried the following way but is there a more pythonic way? Can the same output be achieved in unix command?

>>> import codecs
>>> from collections import Counter
>>> with codecs.open('my.txt','r','utf8') as fin:
...     x = Counter(fin.read())
...     print [i for i in x if len(i) == 1]
... 
[u'\u2014', u'\u2018', u'\u201c', u' ', u'\xa3', u'$', u'(', u',', u'0', u'4', u'8', u'@', u'D', u'H', u'L', u'P', u'T', u'\xd7', u'X', u'`', u'd', u'h', u'l', u'p', u't', u'x', u'\ufffd', u'\ufeff', u'\u2013', u'#', u"'", u'+', u'/', u'3', u'7', u';', u'?', u'C', u'G', u'K', u'O', u'S', u'W', u'_', u'\xe0', u'c', u'g', u'k', u'\u2026', u'o', u's', u'w', u'\n', u'"', u'&', u'*', u'\xad', u'.', u'2', u'6', u':', u'>', u'B', u'F', u'J', u'N', u'R', u'V', u'Z', u'b', u'f', u'\xe9', u'j', u'n', u'r', u'v', u'z', u'\t', u'\u2019', u'\u201d', u'!', u'%', u')', u'-', u'1', u'5', u'9', u'=', u'A', u'\xc2', u'E', u'I', u'M', u'Q', u'U', u'Y', u'a', u'\xe2', u'e', u'i', u'm', u'q', u'u', u'y']

If you are not going to use `x` after this, you can do `[i for i in Counter(fin.read()) if len(i) == 1]` — thefourtheye, Feb 03 '14 at 08:06
But this doesn't make sense to me. `len(i)` will be always `1` only. What are you trying to do? — thefourtheye, Feb 03 '14 at 08:07
If you don't need to count the number of characters, you can just call `set(find.read())`. — michaelmeyer, Feb 03 '14 at 08:08
@doukremt Looks like he wants to gather all the characters which occurred only once. `set` approach will not work. — thefourtheye, Feb 03 '14 at 08:11
nonono, i don't need to get the characters that only occurred once. I need to get all possible unique characters. `len(Counter.key()[0])` will refer to the length keys (i.e. character) rather than counts. — alvas, Feb 03 '14 at 08:54

score 4 · Accepted Answer · edited Aug 03 '17 at 01:36

4

One way is to use sets:

fh = open('my.txt','r').read()
unique_chars = set(fh)
len(unique_chars) #for the length.

edited Aug 03 '17 at 01:36

cs95

379,657
97
704
746

answered Feb 03 '14 at 08:10

Back2Basics

7,406
2
32
45

Looks like he wants to gather all the characters which occurred only once. `set` approach will not work. – thefourtheye Feb 03 '14 at 08:11
That's what his code says. His question is different. – Back2Basics Feb 03 '14 at 08:15

How to get all unique characters in a textfile? unix/python

1 Answers1

Linked