20

I have a list with n elements:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']

I have to assign a number to each string, zero at the start, and then increment by one if the element is different, instead give the same number if the element repeats. Example:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
[ 0,    1,      1,      2,        0,     3,     4,     4,     5,       3    ]

How can I do it?

Georgy
  • 12,464
  • 7
  • 65
  • 73
lola
  • 235
  • 1
  • 7
  • Please update your question with the code you have tried. – quamrana Nov 14 '20 at 20:39
  • That code is not indented correctly. It is unclear what `count` and `count2` should be and why they have different types. Can you tell us in normal language? There is no condition that makes a distinction between a repeating and non-repeating element, so it's expected that this code doesn't work. Further, please provide a [mcve], with all code and example data inline. Lastly, as a new user here, take the [tour] and read [ask]. – Ulrich Eckhardt Nov 14 '20 at 20:51
  • You never check that the element repeats. – bereal Nov 14 '20 at 20:52
  • 1
    My advice would be to sit down with your teacher or a tutor or classmate who can guide you in the right direction. Us giving you the answer would help with your immediate problem, but it wouldn't teach you how to think through and break down problems, which is a fundamental part of programming. You're probably going to run into similar issues with the next homework problem as well. The coursework should also build on earlier concepts as the course progresses, so the later problems would be much more difficult than the earlier ones if you didn't solve the earlier ones yourself. – Bernhard Barker Nov 15 '20 at 07:56
  • Also, see [Python Map List of Strings to Integer List](https://stackoverflow.com/q/9206609/7851470), [Python: how to convert a string array to a factor list](https://stackoverflow.com/q/34682420/7851470). – Georgy Nov 15 '20 at 11:23

7 Answers7

15

With a helper dict:

>>> [*map({k: v for v, k in enumerate(dict.fromkeys(final))}.get, final)]
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]

Another way:

>>> d = {}
>>> [d.setdefault(x, len(d)) for x in final]
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]
superb rain
  • 5,300
  • 2
  • 11
  • 25
  • @superb rain, thanks for the second option. This is so awesome and it spits out the values directly into the list while also assigning to the dictionary. – Joe Ferndz Nov 14 '20 at 22:12
  • 18
    If someone is new enough to programming to not know how to do what's being asked in the question, I very much doubt they'll be able to understand these complex one-liners. – Bernhard Barker Nov 15 '20 at 07:50
11

using a dictionary would achieve this.

def counts(a):
    dis = {}
    count=0
    for i in range(len(a)):
        if a[i] not in dis.keys():
            dis[a[i]] = count
            count+=1
        
    return([dis[x] for x in a])
algorythms
  • 1,547
  • 1
  • 15
  • 28
6

Use a defaultdict and use a counter as a default value function.

Whenever the key exists, it returns the stored "first encountered position", otherwise it calls Incr.__call__ which increments its count to provide a new first encountered position.

With super brain's suggestion, use an existing counter class:

from collections import defaultdict 
from itertools import count

li = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
seen = defaultdict(count().__next__)
print( [seen[val] for val in li] )

Rolling my own Incr, as before, which does give you the advantage that you could return anything (such as a GUID):

from collections import defaultdict 

class Incr:
    def __init__(self):
        self.count = -1

    def __call__(self):
        self.count +=1 
        return self.count

li = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']

seen = defaultdict(Incr())

print( [seen[val] for val in li] )

both provide same output:

[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]
JL Peyret
  • 10,917
  • 2
  • 54
  • 73
  • 2
    Could also use `itertools.count().__next__` or `seen.__len__` or `lambda: len(seen)` as the default factory. – superb rain Nov 15 '20 at 00:49
  • @superbrain itertools.count().__next__ might be a good one. truth be told, I find your len(dict) trick impressive. but it's a bit *too* clever, the kind of thing where it's not obvious enough what is going on, 6 months later. but it certainly is good thinking. – JL Peyret Nov 15 '20 at 01:39
3

Try this:

a = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
dct = {}
counter = 0
for i in range(len(a)):
    if a[i] not in dct.keys():
        dct[a[i]] = counter 
        counter += 1
print([(i, dct[i]) for i in a])
dimay
  • 2,768
  • 1
  • 13
  • 22
2

You just need to proof if you had it already

def counts(final):
    count3 = [] # contains all objects that were already found
    count2=[]
    count=0
    for x in final:
        if x not in count3: # test if it's not already in count3
            count+=1
            count2.append(count)
            count3.append(x)
        else:
            count2.append(count)
    
return count2
Somethink
  • 56
  • 5
1

Cleanest way might be to use pandas:

import pandas as pd
lst =  ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
pd.factorize(lst)

Which outputs:

(array([0, 1, 1, 2, 0, 3, 4, 4, 5, 3], dtype=int64),
 array(['pea', 'rpai', 'schiai', 'rpe', 'zoi', 'briai'], dtype=object))
Hamza
  • 5,373
  • 3
  • 28
  • 43
0

I was proven wrong and I have to use a dictionary (thanks @Steve). Here's the updated version with dictionary included:

a = ['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
b = [None]*len(a)
d = {}
for i,x in enumerate(a):
    if x not in d: d[x] = len (d) #or use d.setdefault(x, len(d)) instead of the if statement (using the algo from @superb rain's)
    b[i] = d[x]    

print (a)
print (b)

The output of this will be:

['pea', 'rpai', 'rpai', 'schiai', 'pea', 'rpe', 'zoi', 'zoi', 'briai', 'rpe']
[0, 1, 1, 2, 0, 3, 4, 4, 5, 3]
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33
  • Well, first, the answer is wrong. Second, the reason to use a dictionary is so you don't have to search through the list over and over again, which is what your code is doing. So your code is inefficient...but it does avoid the use of a dictionary. – CryptoFool Nov 14 '20 at 21:18
  • Thanks for reviewing my code. I have updated it with code using dictionary – Joe Ferndz Nov 14 '20 at 21:52
  • 1
    Much better. There's one big fix you should make though. When you do `if x not in d.keys()` vs `if x not in d`, you wipe out the whole reason to use a dictionary. You're extracting the whole list of keys from the dictionary, which takes time. Then you're doing a linear search through that list. All this instead of just looking for the value in the dictionary directly, which is what dictionaries are good at. – CryptoFool Nov 14 '20 at 22:06
  • Thanks for the explanation. I get it now. Didnt realize the importance of d vs d.keys() – Joe Ferndz Nov 14 '20 at 22:09
  • @Steve So you think we're all still using Python 2? Even though it's officially dead? – superb rain Nov 15 '20 at 00:56
  • @superbrain - sorry. Not following you. - did I use a print statement without parentheses somewhere?, lol – CryptoFool Nov 15 '20 at 00:59
  • @Steve In Python 3, `d.keys()` doesn't return a list but a *view*, and that takes O(1) space and time. – superb rain Nov 15 '20 at 01:00
  • ha! oh really? ok, I was wrong then. I was, in fact, one of the last hold outs in switching from Python 2. To this day, I am (obviously) still learning what has changed about P3. Thanks for correcting me. Sorry @JoeFerndz, I guess I steered you wrong. Is there any disadvantage to `x in d` vs `x in d.keys()`. Just wondering how much damage I was set to inflict had Superb not caught this. – CryptoFool Nov 15 '20 at 01:08
  • 1
    @Steve For a membership test, `x in d` is the right way, `x in d.keys()` is pointless and slower (just not as bad as you thought :-). The view it gives you can be beneficial if you have use for its set-like behavior. – superb rain Nov 15 '20 at 01:17
  • I just read over [What's new in P 3.0](https://docs.python.org/3.0/whatsnew/3.0.html), not wanting to make a similar mistake again. The `d.keys()` issue is the #2 issue listed under "Gotchas", right after the change to `print`. I'm surprised I hadn't picked this one up yet. Thanks again Superbrain! – CryptoFool Nov 15 '20 at 01:45
  • I am glad i made the mistake. I learned a great deal from that mistake and this conversation. Thank you both Steve & Suberb rain. – Joe Ferndz Nov 15 '20 at 02:34