2

Is there any good way to do this:

    input  = ['hi you', 'hello', 'hi you', 'hello', 'good bye']
    output = [1, 2, 1, 2, 3] 

Many thanks!!

( I just edited the input list. Instead of alphabet list my case actually is the new list)

  • 2
    `output = [ord(x) for x in input]` does it give what you want? What do 1, 2, and 3 suppose to mean? Alphabet index, index of first occurrence, or any arbitrary identifier? – Ali Tou Apr 02 '21 at 09:20
  • what if this isn't the alphabet input list but a list of string? and 1, 2, 3 are just the appearance order of each string – Le Vu Minh Huy Apr 02 '21 at 09:27
  • In this case, I would go with the solution of the deleted answer posted minutes ago here. I wish the writer comes back and rewrites his answer. – Ali Tou Apr 02 '21 at 09:29
  • Also, I recommend you to explain your question clearly and avoid generalizations which will lead to non proper answers like we have here. – Ali Tou Apr 02 '21 at 09:31
  • @AliTou That answer did not match the given input and output since he wishes to replace `c` with 3 despite it only appearing at the end of the list – mousetail Apr 02 '21 at 09:34
  • hi @mousetail, what about this list ['hi you', 'hello', hi you', 'hello', 'good bye'] ? – Le Vu Minh Huy Apr 02 '21 at 09:36
  • 1
    @LeVuMinhHuy Can you please edit y our anser with the expected results from both strings? – mousetail Apr 02 '21 at 09:37

4 Answers4

1

You could solve it like this:

output = [input.index(i) for i in input]

Every value in output will be the first index of the value at that index in input. If you want arrays to start at one use:

output = [input.index(i) + 1 for i in input]

(Though you probably want to avoid using built-in functions like input for variable names)

mousetail
  • 7,009
  • 4
  • 25
  • 45
1

The ord() function gives the unicode value of a character. For example, ord('a') == 97.

In unicode, as well as most other character encoding, normal letters are stored in order. Thus, you can get the index of any other letter by simply subtracting ord('a'), for example: ord('b') - ord('a') == 1 and ord('z') - ord('a') == 25. Of course you can add one to get a 1 based index.

Using this knowledge, we can build a comprehension that does what you want:

output = [ord(i) - ord('a') + 1 for i in input]

This will give the desired results for your example input. However, if your string contains any capital letters or simbols, results might be strange. For example ord('A') == 65 so if your string contains a capital A it will be replaced by -31. If you want to treat capital letters the same use:

output = [ord(i.lower()) - ord('a') + 1 for i in input]
mousetail
  • 7,009
  • 4
  • 25
  • 45
1

The most time efficient way would be to build a mapping from the values to the first encountered index:

>>> data = ['a', 'b', 'a', 'b', 'c']
>>> index = {}
>>> for x in data:
...     if x not in index:
...         index[x] = len(index) + 1
...
>>> index
{'a': 1, 'b': 2, 'c': 3}

Then simply map the original data:

>>> [index[x] for x in data]
[1, 2, 1, 2, 3]
juanpa.arrivillaga
  • 88,713
  • 10
  • 131
  • 172
0

You can get it done this way:

idx_dict, result, counter = {}, [], 1 #idx_dict stores first index of every unique value
for i in input1:
    if i not in idx_dict: #stores the first index of every unique value in idx_dict 
        idx_dict[i] = counter
        counter += 1
    result.append(idx_dict[i]) #for every value encountered get its first index from the idx_dict and append to result list

This basically solves the problem in 'n' iterations

Seyi Daniel
  • 2,259
  • 2
  • 8
  • 18