How to map a list of strings to a list of integers

Question

I have a list with n elements:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']

I have to assign a number to each string, zero at the start, and then increment by one if the element is different, instead give the same number if the element repeats. Example:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
[ 0,    1,      1,      2,        0,     3,     4,     4,     5,       3    ]

How can I do it?

That code is not indented correctly. It is unclear what `count` and `count2` should be and why they have different types. Can you tell us in normal language? There is no condition that makes a distinction between a repeating and non-repeating element, so it's expected that this code doesn't work. Further, please provide a [mcve], with all code and example data inline. Lastly, as a new user here, take the [tour] and read [ask]. — Ulrich Eckhardt, Nov 14 '20 at 20:51
My advice would be to sit down with your teacher or a tutor or classmate who can guide you in the right direction. Us giving you the answer would help with your immediate problem, but it wouldn't teach you how to think through and break down problems, which is a fundamental part of programming. You're probably going to run into similar issues with the next homework problem as well. The coursework should also build on earlier concepts as the course progresses, so the later problems would be much more difficult than the earlier ones if you didn't solve the earlier ones yourself. — Bernhard Barker, Nov 15 '20 at 07:56
Also, see [Python Map List of Strings to Integer List](https://stackoverflow.com/q/9206609/7851470), [Python: how to convert a string array to a factor list](https://stackoverflow.com/q/34682420/7851470). — Georgy, Nov 15 '20 at 11:23

score 15 · Answer 1 · answered Nov 14 '20 at 21:08

15

With a helper dict:

>>> [*map({k: v for v, k in enumerate(dict.fromkeys(final))}.get, final)]
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]

Another way:

>>> d = {}
>>> [d.setdefault(x, len(d)) for x in final]
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]

answered Nov 14 '20 at 21:08

superb rain

5,300
2
11
25

@superb rain, thanks for the second option. This is so awesome and it spits out the values directly into the list while also assigning to the dictionary. – Joe Ferndz Nov 14 '20 at 22:12
18

If someone is new enough to programming to not know how to do what's being asked in the question, I very much doubt they'll be able to understand these complex one-liners. – Bernhard Barker Nov 15 '20 at 07:50

score 11 · Answer 2 · answered Nov 14 '20 at 20:56

11

using a dictionary would achieve this.

def counts(a):
    dis = {}
    count=0
    for i in range(len(a)):
        if a[i] not in dis.keys():
            dis[a[i]] = count
            count+=1
        
    return([dis[x] for x in a])

answered Nov 14 '20 at 20:56

algorythms

1,547
1
15
28

Hey! An answer that actually gives the requested result! – CryptoFool Nov 14 '20 at 21:11
7

I believe `for i, _ in enumerate(a)` is more pythonic than `for i in range(len(a))`. But you're only ever using `i` in `a[i]`, in which case it makes more sense to just `for x in a` and use `x` instead of `a[i]`. – Bernhard Barker Nov 15 '20 at 07:48
@BernhardBarker agreed – algorythms Nov 18 '20 at 15:47

JL Peyret · Answer 3 · 2020-11-15T15:50:28.157

Use a defaultdict and use a counter as a default value function.

Whenever the key exists, it returns the stored "first encountered position", otherwise it calls Incr.__call__ which increments its count to provide a new first encountered position.

With super brain's suggestion, use an existing counter class:

from collections import defaultdict 
from itertools import count

li = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
seen = defaultdict(count().__next__)
print( [seen[val] for val in li] )

Rolling my own Incr, as before, which does give you the advantage that you could return anything (such as a GUID):

from collections import defaultdict 

class Incr:
    def __init__(self):
        self.count = -1

    def __call__(self):
        self.count +=1 
        return self.count

li = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']

seen = defaultdict(Incr())

print( [seen[val] for val in li] )

both provide same output:

[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]

Could also use `itertools.count().__next__` or `seen.__len__` or `lambda: len(seen)` as the default factory. — superb rain, Nov 15 '20 at 00:49
@superbrain itertools.count().__next__ might be a good one. truth be told, I find your len(dict) trick impressive. but it's a bit *too* clever, the kind of thing where it's not obvious enough what is going on, 6 months later. but it certainly is good thinking. — JL Peyret, Nov 15 '20 at 01:39

score 3 · Answer 4 · answered Nov 14 '20 at 20:54

3

Try this:

a = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
dct = {}
counter = 0
for i in range(len(a)):
    if a[i] not in dct.keys():
        dct[a[i]] = counter 
        counter += 1
print([(i, dct[i]) for i in a])

answered Nov 14 '20 at 20:54

dimay

2,768
1
13
22

Why the +1? This doesn't produce what the OP asked for. – CryptoFool Nov 14 '20 at 21:09

Somethink · Answer 5 · 2020-11-14T21:13:29.560

2

You just need to proof if you had it already

def counts(final):
    count3 = [] # contains all objects that were already found
    count2=[]
    count=0
    for x in final:
        if x not in count3: # test if it's not already in count3
            count+=1
            count2.append(count)
            count3.append(x)
        else:
            count2.append(count)
    
return count2

edited Nov 14 '20 at 21:13

answered Nov 14 '20 at 20:54

Somethink

56
5

Your solution returns `[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]` for the example data, not the expected output. You never add something to `count3` to test if you already saw the element. – Michael Szczesny Nov 14 '20 at 21:02
Now it returns `[1, 2, 3, 4, 5, 6]`. – Michael Szczesny Nov 14 '20 at 21:09
Yeah. Why the +1 on this. People just look at any code that gets thrown in as an answer and says "sure, that's great!" without reading it or trying it? – CryptoFool Nov 14 '20 at 21:10
`[1, 2, 2, 3, 3, 4, 5, 5, 6, 6]` I'm not testing your code further. There are correct solutions already. – Michael Szczesny Nov 14 '20 at 21:17
I can't believe how many people post code without bothering to try it, especially when the OP has given the exact result they expect. – CryptoFool Nov 14 '20 at 21:21
yeah know i know, it don't gives the result i'd wish it gave, but i didn't know the other approaches – Somethink Nov 14 '20 at 21:22
4

But then why did you post it as an answer? How is that helping anyone? – CryptoFool Nov 14 '20 at 21:24

score 1 · Answer 6 · answered Nov 15 '20 at 00:38

Cleanest way might be to use pandas:

import pandas as pd
lst =  ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
pd.factorize(lst)

Which outputs:

(array([0, 1, 1, 2, 0, 3, 4, 4, 5, 3], dtype=int64),
 array(['pea', 'rpai', 'schiai', 'rpe', 'zoi', 'briai'], dtype=object))

Joe Ferndz · Answer 7 · 2020-11-14T22:17:34.467

0

I was proven wrong and I have to use a dictionary (thanks @Steve). Here's the updated version with dictionary included:

a = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
b = [None]*len(a)
d = {}
for i,x in enumerate(a):
    if x not in d: d[x] = len (d) #or use d.setdefault(x, len(d)) instead of the if statement (using the algo from @superb rain's)
    b[i] = d[x]    

print (a)
print (b)

The output of this will be:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]

edited Nov 14 '20 at 22:17

answered Nov 14 '20 at 21:16

Joe Ferndz

8,417
2
13
33

Well, first, the answer is wrong. Second, the reason to use a dictionary is so you don't have to search through the list over and over again, which is what your code is doing. So your code is inefficient...but it does avoid the use of a dictionary. – CryptoFool Nov 14 '20 at 21:18
Thanks for reviewing my code. I have updated it with code using dictionary – Joe Ferndz Nov 14 '20 at 21:52
1

Much better. There's one big fix you should make though. When you do `if x not in d.keys()` vs `if x not in d`, you wipe out the whole reason to use a dictionary. You're extracting the whole list of keys from the dictionary, which takes time. Then you're doing a linear search through that list. All this instead of just looking for the value in the dictionary directly, which is what dictionaries are good at. – CryptoFool Nov 14 '20 at 22:06
Thanks for the explanation. I get it now. Didnt realize the importance of d vs d.keys() – Joe Ferndz Nov 14 '20 at 22:09
@Steve So you think we're all still using Python 2? Even though it's officially dead? – superb rain Nov 15 '20 at 00:56
@superbrain - sorry. Not following you. - did I use a print statement without parentheses somewhere?, lol – CryptoFool Nov 15 '20 at 00:59
@Steve In Python 3, `d.keys()` doesn't return a list but a *view*, and that takes O(1) space and time. – superb rain Nov 15 '20 at 01:00
ha! oh really? ok, I was wrong then. I was, in fact, one of the last hold outs in switching from Python 2. To this day, I am (obviously) still learning what has changed about P3. Thanks for correcting me. Sorry @JoeFerndz, I guess I steered you wrong. Is there any disadvantage to `x in d` vs `x in d.keys()`. Just wondering how much damage I was set to inflict had Superb not caught this. – CryptoFool Nov 15 '20 at 01:08
1

@Steve For a membership test, `x in d` is the right way, `x in d.keys()` is pointless and slower (just not as bad as you thought :-). The view it gives you can be beneficial if you have use for its set-like behavior. – superb rain Nov 15 '20 at 01:17
I just read over [What's new in P 3.0](https://docs.python.org/3.0/whatsnew/3.0.html), not wanting to make a similar mistake again. The `d.keys()` issue is the #2 issue listed under "Gotchas", right after the change to `print`. I'm surprised I hadn't picked this one up yet. Thanks again Superbrain! – CryptoFool Nov 15 '20 at 01:45
I am glad i made the mistake. I learned a great deal from that mistake and this conversation. Thank you both Steve & Suberb rain. – Joe Ferndz Nov 15 '20 at 02:34

How to map a list of strings to a list of integers

7 Answers7

both provide same output: