14

I have a list, let say L = ['apple','bat','apple','car','pet','bat'].

I want to convert it into Lnew = [ 1,2,1,3,4,2].

Every unique string is associated with a number.

I have a java solution using hashmap, but I don't know how to use hashmap in python. Please help.

ᴀʀᴍᴀɴ
  • 4,443
  • 8
  • 37
  • 57
BuggerNot
  • 438
  • 1
  • 6
  • 20
  • 2
    what you have tried? – Harsha Biyani Apr 04 '17 at 09:27
  • Dict in python works like hashmap – RaminNietzsche Apr 04 '17 at 09:31
  • @RaminNietzsche, I can't speak for Java's hashmap, but Python's dicts don't give the integer indexes the questioners wants, especially alphabetically sorted (which was not specificially requested, but was evident in their desired output). – prooffreader Apr 04 '17 at 09:34
  • 1
    How do you work out the number to associate with a string? – Douglas Leeder Apr 04 '17 at 09:35
  • 1
    @RaminNietzsche, still, you've got the right idea, you can use a dict to create a mapping this way: ``d = {k: v for v, k in enumerate(sorted(set(L)))}`` and then ``Lnew = [d[x] for x in L``. – prooffreader Apr 04 '17 at 09:40
  • Just a meta comment, it's fun watching the comments and answers pile up from users of various skill levels (of which I'm definitely not at the top of the stack) on these sorts of questions second by second, knowing it's got to be a duplicate question. StackOverflow is now so rich in such questions, it's the more complex questions that are its main concern now, so it's like everyone (myself included) is excited to find one they can answer competently. – prooffreader Apr 04 '17 at 10:05

6 Answers6

20

Here's a quick solution:

l = ['apple','bat','apple','car','pet','bat']

Create a dict that maps all unique strings to integers:

d = dict([(y,x+1) for x,y in enumerate(sorted(set(l)))])

Map each string in the original list to its respective integer:

print [d[x] for x in l]
# [1, 2, 1, 3, 4, 2]
acidtobi
  • 1,375
  • 9
  • 13
  • I would just add ``enumerate(set(sorted(l)))`` since questioner didn't specify an alphabetical sort, but their desired output has it. – prooffreader Apr 04 '17 at 09:37
  • Also, you could use a dict comprehension: ``d = {k: v for v, k in enumerate(sorted(set(l)))}`` – prooffreader Apr 04 '17 at 09:39
  • Whether this works depends on whether the OP wants just "a number" as described or in fact the first index+1 as shown in their output; also use a `dict` comprehension – Chris_Rands Apr 04 '17 at 09:44
  • `[3, 2, 3, 1, 0, 2]` is not the result OP wanted, am I missing something here? – timgeb Apr 04 '17 at 09:46
  • The answerer didn't sort the list or 1-index the mapping. The following will use the same approach and give the same output: ``d = {k: v+1 for v, k in enumerate(sorted(set(L)))}``, then ``Lnew = [d[x] for x in L]``. – prooffreader Apr 04 '17 at 09:48
  • No sorting is not required. – BuggerNot Apr 04 '17 at 09:59
  • I think you should remove sorting from your answer. You've basically added a GPS to a skateboard. It's much more expensive with little added benefit to the user. The cost of running this is O(N*log(N)), but originally was O(N). – ldmtwo Nov 20 '19 at 17:46
3

You can use a map dictionary:

d = {'apple':1, 'bat':2, 'car':3, 'pet':4}
L = ['apple','bat','apple','car','pet','bat']
[d[x] for x in L] # [1, 2, 1, 3, 4, 2]

For auto creating map dictionary you can use defaultdict(int) with a counter.

from collections import defaultdict
d = defaultdict(int)
co = 1
for x in L:
    if not d[x]:
        d[x] = co
        co+=1
d # defaultdict(<class 'int'>, {'pet': 4, 'bat': 2, 'apple': 1, 'car': 3})

Or as @Stuart mentioned you can use d = dict(zip(set(L), range(len(L)))) for creating dictionary

ᴀʀᴍᴀɴ
  • 4,443
  • 8
  • 37
  • 57
3
x = list(set(L))
dic = dict(zip(x, list(range(1,len(x)+1))))

>>> [dic[v] for v in L]
[1, 2, 1, 3, 4, 2]
Colonel Beauvel
  • 30,423
  • 11
  • 47
  • 87
2

You'd use a hashmap in Python, too, but we call it a dict.

>>> L = ['apple','bat','apple','car','pet','bat']
>>> idx = 1
>>> seen_first = {}
>>>
>>> for word in L:
...     if word not in seen_first:
...         seen_first[word] = idx
...         idx += 1
... 
>>> [seen_first[word] for word in L]
[1, 2, 1, 3, 4, 2]
timgeb
  • 76,762
  • 20
  • 123
  • 145
  • +1 for the most obvious and sensible answer; but how about `{x:len(L)-i for i,x in enumerate(L[::-1])}` to build the dict – Chris_Rands Apr 04 '17 at 09:40
  • @Chris_Rands I just realized OP does not want to go by index + 1, but give the first unique word the number 1, the second unique word the number 2, and so on. (I edited my answer accordingly.) – timgeb Apr 04 '17 at 09:49
  • I now think what they actually want (based on the top answer) is this http://stackoverflow.com/questions/42350029/assign-a-number-to-each-unique-value-in-a-list but frankly the question is not clear and should be closed IMO – Chris_Rands Apr 04 '17 at 09:51
  • @Chris_Rands yeah I'm confused now. – timgeb Apr 04 '17 at 09:51
0

You can try:

>>> L = ['apple','bat','apple','car','pet','bat']
>>> l_dict = dict(zip(set(L), range(len(L))))
>>> print l_dict
{'pet': 0, 'car': 1, 'bat': 2, 'apple': 3}
>>> [l_dict[x] for x in L]
[3, 2, 3, 1, 0, 2]
Harsha Biyani
  • 7,049
  • 9
  • 37
  • 61
-2
Lnew = []
for s in L:
    Lnew.append(hash(s))  # hash(x) returns a unique int based on string
Ash Ketchum
  • 1,940
  • 1
  • 12
  • 6
  • From the question, I think they're looking for 1-based integers, not the very long integers ``hash()`` gives. – prooffreader Apr 04 '17 at 09:45
  • 1
    consider providing an explanation to your code – arghtype Apr 04 '17 at 18:43
  • `hash` does *not* return a unique `int` for each string. Hash collisions [are possible](https://stackoverflow.com/q/37127946/1959808). – 0 _ Jan 18 '18 at 07:26
  • The general approach here is fine if you explain that this is lossy encoding (mapping is not guaranteed to be 1:1 and may not be fully reversible). The bigger issue is that the built in hash function is not consistent for any two runs. hashlib with blake2s and reducing to int would be better. – ldmtwo Nov 20 '19 at 17:52