Fastest way to convert a dict's keys & values from str to Unicode?

Question

I'm working with a counter from collections import Counter and I want to print its values using matplotlib.pylot.

When I try to do it using:

plt.bar(range(len(cnt)), cnt.values(), align='center')
plt.xticks(range(len(cnt)), cnt.keys())
plt.show()

I get the following error:

ValueError: matplotlib display text must have all code points < 128 or use Unicode strings

That's why I'm trying to convert the Counter dictionary keys to Unicode.

Doesn't sound like this is the bottleneck of your application. I would go for the cleanest / most obvious rather than the fastest. — tripleee, May 23 '13 at 04:59

Cairnarvon · Accepted Answer · 2014-10-02T16:36:06.093

13

If you're using Python 2.7, you can use a dict comprehension:

unidict = {k.decode('utf8'): v.decode('utf8') for k, v in strdict.items()}

For older versions:

unidict = dict((k.decode('utf8'), v.decode('utf8')) for k, v in strdict.items())

(This assumes your strings are in UTF-8, of course.)

edited Oct 02 '14 at 16:36

answered May 23 '13 at 03:45

Cairnarvon

25,981
9
51
65

Could you tell me more about what you did? .. it confuse me a little the for inside the Dict.. – AAlvz May 23 '13 at 03:57
1

The first uses a [dict comprehension](http://www.python.org/dev/peps/pep-0274/), the second a [generator expression](http://www.python.org/dev/peps/pep-0289/) to create an iterator of tuples from which the [`dict` constructor](http://docs.python.org/2/library/stdtypes.html#dict) can build a new dictionary. They're analogous to [list comprehensions](http://docs.python.org/2/tutorial/datastructures.html#list-comprehensions). – Cairnarvon May 23 '13 at 04:08

score 1 · Answer 2 · answered Oct 11 '18 at 17:26

1

So, i thought the op asked for unicode, not for UTF-8. Unicode is not an encoding, it's just actual text. So would this not be more accurate and/or readable?

unidict = {unicode(k): unicode(v) for k, v in strdict.items()}

answered Oct 11 '18 at 17:26

zgypa

19
1

If your default string encoding is UTF-8 this is exactly equivalent to the other answer. The `unicode` constructor behaves identically to the `decode` method on `str` objects. I just think it's more helpful to be explicit about the fact that `str` objects are *encoded* in a specific encoding (explicitly assumed to be UTF-8 in my answer), and converting them to `unicode` objects requires *decoding* them, which is something that trips up a lot of users conceptually. – Cairnarvon Nov 28 '19 at 19:27

Fastest way to convert a dict's keys & values from str to Unicode?

2 Answers2

Linked

Related