There are a lot of questions around this on SO and elsewhere, but nothing has helped me solve my problem so I thought I'd present my specifics.
I have a route in Flask that looks like this
@bp.route("/experiment/<name>", methods=["GET"])
I want to be able to use unicode characters in the name
part. In this example I am using \u2019
, i.e. ’
. Let's just say name="bob’s house"
. When I navigate to this URL I get the following error:
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position ...: ordinal not in range(256)
The final line of the traceback is this:
File "/var/lang/lib/python3.10/site-packages/werkzeug/_internal.py", line 119, in _wsgi_decoding_dance
return s.encode("latin1").decode(charset, errors)
So when I run my Flask server locally I don't get this error. However when I run it on AWS (lambda), I do. Locally, I just run the server using poetry run python src/webapp/app.py
, however on AWS it runs from a Docker container. Because I'm on an M1 Mac, my Docker python version is public.ecr.aws/lambda/python:3.10-x86_64
(Is this relevant?). There seem to be no relevant environment variable discrepancies.
Some solutions suggest parsing the URL using urllib.unquote(param)
or similar. But the error throws before I can run my own code. In the following, the print statement does not get called.
@bp.route("/experiment/<name>", methods=["GET"])
def render(name):
print(f"name: {name}")
I have also tried adding information to the blueprint route, for example specifying the type of name and the string format (u""
):
@bp.route(u"/experiment/<string:name>", methods=["GET"])
Which did not solve the error. <path:name>
also doesn’t fix it.
I have tried setting PYTHONUTF8=1
in my AWS Lambda's environment variables, which did not fix the issue. I also tried setting this same flag through the Dockerfile:
ENTRYPOINT ["python3", "-X utf8", "src/webapp/app.py"]
Or adding (based on this)
ENV PYTHONIOENCODING=utf-8
does not fix this issue.
Finally, based on the documentation here I added the following to my config.py
:
environ.update(HTTP_ACCEPT_CHARSET="ISO-8859-1,utf-8;q=0.7,*;q=0.7")
and still no luck.
I understand ASCII is the standard HTTP character set, and the concept of percentage encoding. If name = bob's house
, I have no problem. I do not understand why the space character succeeds while the ’
character fails.
I don't want to use dead-quotes ('
), but want freedom to use properly directed quotes (’
).
I understand the issue in Python, but I do not understand how I can resolve it in my WSGI/Werkzeug/Flask/Lambda flow.
I hope you can see I have tried to look for the solution, and even though there are many questions around it I have still not been able to find a solution. Could someone explain how to resolve this issue?
Edit: Further attempt
from lambdarado import start
from urllib.parse import quote
from flask import request
def get_app():
# This function must return a WSGI app, e.g. Flask
from webapp import create_app
app = create_app()
@app.before_request
def encode_path_info():
print("got call to encode_path_info")
print(request.path)
print(quote(request.path))
# Properly encode the path_info using UTF-8
path = quote(request.path)
request.path = path
return app
my_app = get_app
start(my_app)
I tried the above, but the error gets thrown before this function gets called. The function gets called fine on non-breaking URLs.