11

I have a dictionary and I want to convert every value to utf-8. This works, but is there a "more pythonic" way?

            for key in row.keys():
                row[key] = unicode(row[key]).encode("utf-8")

For a list I could do

[unicode(s).encode("utf-8") for s in row]

but I'm not sure how to do the equivalent thing for dictionaries.

This is different from Python Dictionary Comprehension because I'm not trying to create a dictionary from scratch, but from an existing dictionary. The solutions to the linked question do not show me how to loop through the key/value pairs in the existing dictionary in order to modify them into new k/v pairs for the new dictionary. The answer (already accepted) below shows how to do that and is much clearer to read/understand for someone who has a task similar to mine than the answers to the linked related question, which is more complex.

Community
  • 1
  • 1
PurpleVermont
  • 1,179
  • 4
  • 18
  • 46
  • 4
    I'm impressed you managed to come up with the right phrase (dictionary comprehension) but not to search for "python dictionary comprehension"! – DSM Nov 13 '15 at 18:21
  • 1
    why are you converting your keys to utf-8? this sounds like XY problem – Joran Beasley Nov 13 '15 at 18:23
  • 1
    Possible duplicate of [Python Dictionary Comprehension](http://stackoverflow.com/questions/14507591/python-dictionary-comprehension) – GingerPlusPlus Nov 13 '15 at 18:26
  • Danger Will Robinson! `unicode(row[key])` will use the system charset to decode `row[key]`. This implies that `row[key]` is already encoded to a specific character set. Further, you shouldn't encode until you need to output it somewhere, in which case, allow print to convert or write using a an encoding text wrapper, like `io.open()` – Alastair McCormack Nov 13 '15 at 18:28
  • @DSM I found list comprehension, and extrapolated to dictionary comprehension, but didn't find a good example when I googled for it. – PurpleVermont Nov 13 '15 at 22:31
  • @JoranBeasley I am converting not my keys but my values to utf-8 because when I try to write them out with a CSV DictWriter it breaks on unicode strings that are outside of the ascii range. Using Python 2.7 – PurpleVermont Nov 13 '15 at 22:32
  • @PurpleVermont I would recommend using https://github.com/jdunck/python-unicodecsv instead of trying to write your own encoder/decoder – Joran Beasley Nov 13 '15 at 22:59
  • @JoranBeasley why is that better? Having to install extra packages makes it a hassle for sharing the code. – PurpleVermont Nov 14 '15 at 06:55

6 Answers6

18

Use a dictionary comprehension. It looks like you're starting with a dictionary so:

 mydict = {k: unicode(v).encode("utf-8") for k,v in mydict.iteritems()}

The example for dictionary comprehensions is near the end of the block in the link.

That1Guy
  • 7,075
  • 4
  • 47
  • 59
8

Python 3 version building on that one answer by That1Guy.

{k: str(v).encode("utf-8") for k,v in mydict.items()}
kjmerf
  • 4,275
  • 3
  • 21
  • 29
5

As I had this problem as well, I built a very simple function that allows any dict to be decoded in utf-8 (The problem with the current answer is that it applies only for simple dict).

If it can help anyone, it is great, here is the function :

def utfy_dict(dic):
    if isinstance(dic,unicode):
        return(dic.encode("utf-8"))
    elif isinstance(dic,dict):
        for key in dic:
            dic[key] = utfy_dict(dic[key])
        return(dic)
    elif isinstance(dic,list):
        new_l = []
        for e in dic:
            new_l.append(utfy_dict(e))
        return(new_l)
    else:
        return(dic)
Henri Chabert
  • 71
  • 1
  • 5
1

It depends why you're implicitly encoding to UTF-8. If it's because you're writing to a file, the pythonic way is to leave your strings as Unicode and encode on output:

with io.open("myfile.txt", "w", encoding="UTF-8") as my_file:
    for (key, values) in row.items():
        my_string = u"{key}: {value}".format(key=key, value=value)
        my_file.write(my_string)
Alastair McCormack
  • 26,573
  • 8
  • 77
  • 100
  • I am, but I'm writing it with a csv DictWriter, and I'm not sure how to do the encoding on output in that case. – PurpleVermont Nov 13 '15 at 22:17
  • Python 2.x's CSV module is broken with Unicode, so the other contributor's answers are the easiest approach. If you want to do it properly, like Python3 does it, use a fixed CSV module: https://github.com/jdunck/python-unicodecsv – Alastair McCormack Nov 13 '15 at 22:27
  • I can't use Python 3 for other reasons, but thanks for showing the "right way" to do it if possible. – PurpleVermont Nov 13 '15 at 22:30
0

You can just iterate through the keys if you wanted to:

{x:unicode(a[x]).encode("utf-8") for x in a.keys()}
ergonaut
  • 6,929
  • 1
  • 17
  • 47
-1

Best approach to convert non-ascii dictionary value in ascii characters is

mydict = {k: unicode(v, errors='ignore').encode('ascii','ignore') for k,v in mydict.iteritems()} 

Best approach to convert non-utf-8 dictionary value in utf-8 characters is

mydict = {k: unicode(v, errors='ignore').encode('utf-8','ignore') for k,v in mydict.iteritems()}

For more reference read python unicode documentation

Anurag Misra
  • 1,516
  • 18
  • 24