1

I am working on a school project using Google App Engine and Python 2.7. I am trying to output a nested dictionary like so: {city:[{song1:artist1},{song2:artist2}], city2:[{song1:artist1},{song2:artist2}]}. However, the city names and the songs are from around the world, with special foreign characters. When I print out the dictionary, I get this string:

{'uOsaka'[{'u\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki}, etc... (where Osaka is the city, the unicode is the song, and Takajin is the artist)

Does anyone know how to get the name of the cities/songs to appear correctly?

mgoya
  • 512
  • 3
  • 12
  • 1
    Printing it out should work fine. Also, what terminal are you on? – iz_ Nov 28 '18 at 23:36
  • I am using this dictionary to pass to a Jinja template where it will be outputted in a html file. Printing individual values would work fine yes, but I am concerned with how they appear in the dictionary because that's ultimately what I will be passing. And I am using powershell on Windows. – Katherine Waller Nov 28 '18 at 23:41
  • Are you *sure* that's the output you get? You seem to be missing a `:`, and `'u\u3086\u3081\u3044\u3089\u3093\u304b\u306d'` is far more likely to be `u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d'` – donkopotamus Nov 28 '18 at 23:42
  • I assumed that was just a typo, but worth confirming. – iz_ Nov 28 '18 at 23:43
  • Yes, I am sorry for the typos! I was missing a colon after Osaka, and I did mean u'. – Katherine Waller Nov 28 '18 at 23:45
  • Just tested Unicode characters with Jinja, works like expected. – iz_ Nov 28 '18 at 23:46
  • 1
    If I pass my dictionary into Jinja, with this code in the html template:

    Hello

    {{citiesAndSongs}} then, when running google app engine it opens localhost and outputs the dictionary. However, some city names aren't displayed properly. u'Bras\xedlia' and u'Bogot\xe1' are a couple examples of special character cities that don't output as they should.
    – Katherine Waller Nov 28 '18 at 23:56
  • Change to Python 3 and all these problems will quietly disappear.. – wim Nov 28 '18 at 23:58
  • I actually wrote it in python 3 originally, and it works perfectly. Google app engine doesn't work with python 3 however. – Katherine Waller Nov 28 '18 at 23:58
  • @KatherineWaller ugh, I feel for you. For all the complaining people did, Python 2 - > 3 was worth it for the unicode changes alone. Anyway, you should add your situation with Jinja /Google App engine, I'm going to go ahead and tag those as well. – juanpa.arrivillaga Nov 29 '18 at 00:06
  • Possible duplicate of [Writing and reading JSON with Python, how to decode/encode special characters?](https://stackoverflow.com/questions/19138046/writing-and-reading-json-with-python-how-to-decode-encode-special-characters) – ivan_pozdeev Nov 29 '18 at 00:27
  • Google App Engine does actually work with [Python 3](https://cloud.google.com/appengine/docs/standard/python3/), right now it's in Beta but you can use it. – Rubén C. Nov 29 '18 at 10:22

2 Answers2

1

The underlying issue in python 2.7 is that printing a dictionary involves converting it to a string, and that string will be a str rather than a unicode. Hence your output.

However when your render the individual items you will find they are fine:

>>> d = {u'Osaka': [{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d': u'Takajin Yashiki'}]} 
>>> for k, v in d.viewitems():
...   for pair in v:
...     for song, artist in pair.viewitems():
...         print k, song, artist
... 
Osaka ゆめいらんかね Takajin Yashiki

Note that this is a Python 2 behavior. In Python 3, where str is text, this data will be printed as UTF-8 and should render naturally in the console assuming you have the necessary fonts installed for Japanese glyphs:

(3.7) >>> print(d)
{'Osaka': [{'ゆめいらんかね': 'Takajin Yashiki'}]}
wim
  • 338,267
  • 99
  • 616
  • 750
donkopotamus
  • 22,114
  • 2
  • 48
  • 60
  • Apparently, OP is encountering a problem with Jinja/ Google app engine, though. wouldn't surprise me if there was a call to `__repr__` somewhere... – juanpa.arrivillaga Nov 29 '18 at 00:06
0

Like in How to print national characters in list representation? , you need to use a custom procedure to print your data that would print strings themselves instead of their repr:

def nrepr(data):
    city_items=[]
    for city, jukebox in data.iteritems():
       jukebox_items=[]
       for song,artist in jukebox.iteritems():
           jukebox_items.append(u'"%s":"%s"' % (song,artist) )
       city_items.append(u'"%s":{%s}' % (city, u",".join(jukebox_items)))
    return u'{%s}' % u",".join(city_items)

>>>  data={u'Osaka':{u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d':u'Takajin Yashiki'}}

>>> print nrepr(data)
{"Osaka":{"ゆめいらんかね":"Takajin Yashiki"}}

(use from __future__ import unicode_literals at the start of the file to avoid putting u before every literal)

You are not constrained to mimicking Python's default output format, you can print them any way you like.


Alternatively, you can use a unicode subclass for your strings that would have repr with national characters:

class nu(unicode):
    def __repr__(self):
        return self.encode('utf-8')    #must return str

>>> data={nu(u'Osaka'):{nu(u'\u3086\u3081\u3044\u3089\u3093\u304b\u306d'):nu(u'Takajin Yashiki')}}
>>> data
{Osaka: {ゆめいらんかね: Takajin Yashiki}}

This is problematic 'cuz repr output is presumed to only contain ASCII characters and various code relies on this. You are extremily likely to get UnicodeErrors in random places. It will also print mojibake if a specific output channel's encoding is different from utf-8 or if further transcoding is involved.

ivan_pozdeev
  • 33,874
  • 19
  • 107
  • 152