0

I am fetching data from an API as JSON in Flask, and attempting to render this out to a Jinja Template as RSS, but characters from the JSON such as apostrophes and emdashes get rendered out as ’ and — in the template.

How do I ensure the template renders the proper characters?

This is my code in flask:

json_resp = resp.json()
posts = json_resp['list']
template = render_template('recs.rss',posts=posts)
response = make_response(template)
response.headers.set('Content-Type', 'application/rss+xml')
return response

and this is the recs.rss template:

<?xml version="1.0"?>
<rss version="2.0">
   <channel>
      ...
      {% for post in posts %}
      <item>
        {% if post.title %}<title><![CDATA[{{ post.title }}]]></title>{% endif %}
        ...
      </item>
      {% endfor %}
   </channel>
</rss>

If I encode it as utf-8:

post_item['title'] = post_item['title'].encode('utf-8'

I get utf-8 encoding in the RSS feed, and it seems to be prefaced with a b instead of a u:

b'The Dirty Secret of \xe2\x80\x98Secret Family Recipes\xe2\x80\x99'

If I attempt to then decode using utf-8 as suggested in the comment below and this post:

Python Selenium ().text returns "’" instead of apostrophe (')

I get an error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position 20: invalid start byte

I'm wondering if there is some way to fix this with headers that I send to the jinja template, or something that can be placed within the Jinja template itself to allow for the existance of things like curly quotes.

mheavers
  • 29,530
  • 58
  • 194
  • 315
  • 1
    Does this answer your question? [Python Selenium ().text returns "’" instead of apostrophe (')](https://stackoverflow.com/questions/55737316/python-selenium-text-returns-%c3%a2%e2%82%ac-instead-of-apostrophe) – Michael Ruth Apr 02 '21 at 23:52
  • @MichaelRuth - not really. I know that there is some issue with encoding here - I added more clarification to the question. – mheavers Apr 05 '21 at 20:21
  • Did you try explicitly setting encoding for the XML with ``? Did you try setting charset in the Content-Type header with `application/rss+xml; charset=utf-8`? – Karl Sutt Apr 06 '21 at 06:42
  • @KarlSutt - the content-type header was the missing piece! (I had the encoding in the xml tag but it wasn't working). Feel free to write this as an answer and I'll accept. – mheavers Apr 06 '21 at 17:50
  • Good stuff — glad you got it working! I've added an answer. – Karl Sutt Apr 06 '21 at 18:09

1 Answers1

2

You should specify the response body encoding in the Content-Type header with application/rss+xml; charset=utf-8.

Karl Sutt
  • 575
  • 3
  • 11