Label elements in a duplicate list

Question

Is there any good way to do this:

    input  = ['hi you', 'hello', 'hi you', 'hello', 'good bye']
    output = [1, 2, 1, 2, 3]

Many thanks!!

( I just edited the input list. Instead of alphabet list my case actually is the new list)

`output = [ord(x) for x in input]` does it give what you want? What do 1, 2, and 3 suppose to mean? Alphabet index, index of first occurrence, or any arbitrary identifier? — Ali Tou, Apr 02 '21 at 09:20
what if this isn't the alphabet input list but a list of string? and 1, 2, 3 are just the appearance order of each string — Le Vu Minh Huy, Apr 02 '21 at 09:27
In this case, I would go with the solution of the deleted answer posted minutes ago here. I wish the writer comes back and rewrites his answer. — Ali Tou, Apr 02 '21 at 09:29
Also, I recommend you to explain your question clearly and avoid generalizations which will lead to non proper answers like we have here. — Ali Tou, Apr 02 '21 at 09:31
@AliTou That answer did not match the given input and output since he wishes to replace `c` with 3 despite it only appearing at the end of the list — mousetail, Apr 02 '21 at 09:34
hi @mousetail, what about this list ['hi you', 'hello', hi you', 'hello', 'good bye'] ? — Le Vu Minh Huy, Apr 02 '21 at 09:36
@LeVuMinhHuy Can you please edit y our anser with the expected results from both strings? — mousetail, Apr 02 '21 at 09:37

score 1 · Accepted Answer · answered Apr 02 '21 at 09:19

You could solve it like this:

output = [input.index(i) for i in input]

Every value in output will be the first index of the value at that index in input. If you want arrays to start at one use:

output = [input.index(i) + 1 for i in input]

(Though you probably want to avoid using built-in functions like input for variable names)

score 1 · Answer 2 · answered Apr 02 '21 at 09:28

The ord() function gives the unicode value of a character. For example, ord('a') == 97.

In unicode, as well as most other character encoding, normal letters are stored in order. Thus, you can get the index of any other letter by simply subtracting ord('a'), for example: ord('b') - ord('a') == 1 and ord('z') - ord('a') == 25. Of course you can add one to get a 1 based index.

Using this knowledge, we can build a comprehension that does what you want:

output = [ord(i) - ord('a') + 1 for i in input]

This will give the desired results for your example input. However, if your string contains any capital letters or simbols, results might be strange. For example ord('A') == 65 so if your string contains a capital A it will be replaced by -31. If you want to treat capital letters the same use:

output = [ord(i.lower()) - ord('a') + 1 for i in input]

score 1 · Answer 3 · answered Apr 02 '21 at 09:40

The most time efficient way would be to build a mapping from the values to the first encountered index:

>>> data = ['a', 'b', 'a', 'b', 'c']
>>> index = {}
>>> for x in data:
...     if x not in index:
...         index[x] = len(index) + 1
...
>>> index
{'a': 1, 'b': 2, 'c': 3}

Then simply map the original data:

>>> [index[x] for x in data]
[1, 2, 1, 2, 3]

Seyi Daniel · Answer 4 · 2021-04-02T10:16:38.673

You can get it done this way:

idx_dict, result, counter = {}, [], 1 #idx_dict stores first index of every unique value
for i in input1:
    if i not in idx_dict: #stores the first index of every unique value in idx_dict 
        idx_dict[i] = counter
        counter += 1
    result.append(idx_dict[i]) #for every value encountered get its first index from the idx_dict and append to result list

This basically solves the problem in 'n' iterations

Label elements in a duplicate list

4 Answers4