Replacing python list elements with key

Question

I have a list of non-unique strings:

list = ["a", "b", "c", "a", "a", "d", "b"]

I would like to replace each element with an integer key which uniquely identifies each string:

list = [0, 1, 2, 0, 0, 3, 1]

The number does not matter, as long as it is a unique identifier.

So far all I can think to do is copy the list to a set, and use the index of the set to reference the list. I'm sure there's a better way though.

Are all of the "strings" single characters as you have here? If so, you could consider using the [ord](https://docs.python.org/2/library/functions.html#ord) function. [Sets](https://docs.python.org/2/library/sets.html) do not support indexing. — rkersh, Jun 02 '16 at 22:36
BTW, don't use `list` as a variable name, as that shadows the built-in `list` type. It won't hurt anything here, but it can lead to mysterious bugs if your script later tries to use the `list` type to construct a list. — PM 2Ring, Jun 02 '16 at 22:55

user2390182 · Accepted Answer · 2016-06-02T23:10:35.987

10

This will guarantee uniqueness and that the id's are contiguous starting from 0:

id_s = {c: i for i, c in enumerate(set(list))}
li = [id_s[c] for c in list]

On a different note, you should not use 'list' as variable name because it will shadow the built-in type list.

edited Jun 02 '16 at 23:10

answered Jun 02 '16 at 22:42

user2390182

72,016
6
67
89

score 5 · Answer 2 · edited May 23 '17 at 11:58

5

Here's a single pass solution with defaultdict:

from collections import defaultdict
seen = defaultdict()
seen.default_factory = lambda: len(seen)  # you could instead bind to seen.__len__

In [11]: [seen[c] for c in list]
Out[11]: [0, 1, 2, 0, 0, 3, 1]

It's kind of a trick but worth mentioning!

An alternative, suggested by @user2357112 in a related question/answer, is to increment with itertools.count. This allows you to do this just in the constructor:

from itertools import count
seen = defaultdict(count().__next__)  # .next in python 2

This may be preferable as the default_factory method won't look up seen in global scope.

edited May 23 '17 at 11:58

Community

1
1

answered Jun 02 '16 at 23:03

Andy Hayden

359,921
101
625
535

1

Very clever, I like it! I had never thought about using that kind of reflexive powers in the `default_factory`. – user2390182 Jun 02 '16 at 23:07
@schwobaseggl I *guess* that's what the attributes is there for (rather than being private), still I had hoped they'd be a single constructor way to do it (and reference self)... it feels a little dirty/old school. :/ – Andy Hayden Jun 02 '16 at 23:11
3

[`itertools.count().next` also works](http://stackoverflow.com/questions/18605500/assign-strings-to-ids-in-python/18605520#18605520) for the `default_factory`, or you could use `seen = defaultdict(lambda: len(seen))`, since `seen` doesn't need to exist yet to create the lambda. I prefer `itertools.count().next` to `lambda: len(seen)`, since it doesn't require inspecting the dict's state in the middle of a mutative operation, but either version feels like there's too much magic going on in the `default_factory`. – user2357112 Jun 02 '16 at 23:30
@user2357112 I don't think it's too much magic, that's what it is it there for! It's annoying that the itertools.count api is different for python 3 (you need to use `__next__`) but I agree the itertools.count is much nicer that len (though both are O(1)). – Andy Hayden Jun 02 '16 at 23:37
@user2357112 I missed the lambda part... the worse part is it looks up the `seen` variable in scope (which could dirtily be avoided by binding to `seen.__len__` (if only [len were a proper oo method](http://stackoverflow.com/questions/237128/is-there-a-reason-python-strings-dont-have-a-string-length-method#comment9848314_237150)). This really needs to be created in a function to avoid that. Your solution is better! – Andy Hayden Jun 03 '16 at 01:08
to avoid the dichotomy between Python 2 and 3, consider `defaultdict(lambda: next(count))` where `count = itertools.count()`. – Adam Jun 05 '16 at 01:00
@codesparkle and if you do that consider defining it in a function (so that the count variable doesn't leak, like the seen variable above). – Andy Hayden Jun 05 '16 at 01:17

score 4 · Answer 3 · edited Jun 03 '16 at 00:25

4

>>> lst = ["a", "b", "c", "a", "a", "d", "b"]
>>> nums = [ord(x) for x in lst]
>>> print(nums)
[97, 98, 99, 97, 97, 100, 98]

edited Jun 03 '16 at 00:25

Tonechas

13,398
16
46
80

answered Jun 02 '16 at 22:37

Chris

15,819
3
24
37

4

This works only if each item in the list is a single character, which the OP has said (in a comment) my not be so. – Rory Daulton Jun 02 '16 at 22:49
1

This could also do with a little bit of explanation IMO. – Andy Hayden Jun 02 '16 at 23:26

score 2 · Answer 4 · answered Jun 02 '16 at 22:38

2

If you are not picky, then use the hash function: it returns an integer. For strings that are the same, it returns the same hash:

li = ["a", "b", "c", "a", "a", "d", "b"]
li = map(hash, li)                # Turn list of strings into list of ints
li = [hash(item) for item in li]  # Same as above

answered Jun 02 '16 at 22:38

Hai Vu

37,849
11
66
93

This does work, assuming dynamic results are acceptable. Good one. – Chris Jun 02 '16 at 22:40
3

This does not work. Hashes are not guaranteed to be unique. – user2357112 Jun 02 '16 at 22:43

Padraic Cunningham · Answer 5 · 2016-06-02T23:53:46.800

1

A functional approach:

l = ["a", "b", "c", "a", "a", "d", "b", "abc", "def", "abc"]
from itertools import count
from operator import itemgetter

mapped = itemgetter(*l)(dict(zip(l, count())))

You could also use a simple generator function:

from itertools import count

def uniq_ident(l):
    cn,d  = count(), {}
    for ele in l:
        if ele not in d:
            c = next(cn)
            d[ele] = c
            yield c
        else:
            yield d[ele]


In [35]: l = ["a", "b", "c", "a", "a", "d", "b"]

In [36]: list(uniq_ident(l))
Out[36]: [0, 1, 2, 0, 0, 3, 1]

edited Jun 02 '16 at 23:53

answered Jun 02 '16 at 23:25

Padraic Cunningham

176,452
29
245
321

try with `l = ["\t\t", "c"]` – Andy Hayden Jun 02 '16 at 23:30

Replacing python list elements with key

5 Answers5

Linked