0

There are a lot of questions around this on SO and elsewhere, but nothing has helped me solve my problem so I thought I'd present my specifics.

I have a route in Flask that looks like this

@bp.route("/experiment/<name>", methods=["GET"])

I want to be able to use unicode characters in the name part. In this example I am using \u2019, i.e. . Let's just say name="bob’s house". When I navigate to this URL I get the following error:

UnicodeEncodeError: 'latin-1' codec can't encode character '\u2019' in position ...: ordinal not in range(256)

The final line of the traceback is this:

File "/var/lang/lib/python3.10/site-packages/werkzeug/_internal.py", line 119, in _wsgi_decoding_dance
    return s.encode("latin1").decode(charset, errors)

So when I run my Flask server locally I don't get this error. However when I run it on AWS (lambda), I do. Locally, I just run the server using poetry run python src/webapp/app.py, however on AWS it runs from a Docker container. Because I'm on an M1 Mac, my Docker python version is public.ecr.aws/lambda/python:3.10-x86_64 (Is this relevant?). There seem to be no relevant environment variable discrepancies.

Some solutions suggest parsing the URL using urllib.unquote(param) or similar. But the error throws before I can run my own code. In the following, the print statement does not get called.

@bp.route("/experiment/<name>", methods=["GET"])
def render(name):
    print(f"name: {name}")

I have also tried adding information to the blueprint route, for example specifying the type of name and the string format (u""):

@bp.route(u"/experiment/<string:name>", methods=["GET"])

Which did not solve the error. <path:name> also doesn’t fix it.

I have tried setting PYTHONUTF8=1 in my AWS Lambda's environment variables, which did not fix the issue. I also tried setting this same flag through the Dockerfile:

ENTRYPOINT ["python3", "-X utf8", "src/webapp/app.py"]

Or adding (based on this)

ENV PYTHONIOENCODING=utf-8

does not fix this issue.

Finally, based on the documentation here I added the following to my config.py:

environ.update(HTTP_ACCEPT_CHARSET="ISO-8859-1,utf-8;q=0.7,*;q=0.7")

and still no luck.

I understand ASCII is the standard HTTP character set, and the concept of percentage encoding. If name = bob's house, I have no problem. I do not understand why the space character succeeds while the character fails.

I don't want to use dead-quotes ('), but want freedom to use properly directed quotes ().

I understand the issue in Python, but I do not understand how I can resolve it in my WSGI/Werkzeug/Flask/Lambda flow.

I hope you can see I have tried to look for the solution, and even though there are many questions around it I have still not been able to find a solution. Could someone explain how to resolve this issue?

Edit: Further attempt

from lambdarado import start
from urllib.parse import quote
from flask import request


def get_app():
    # This function must return a WSGI app, e.g. Flask
    from webapp import create_app
    app = create_app()

    @app.before_request
    def encode_path_info():
        print("got call to encode_path_info")
        print(request.path)
        print(quote(request.path))
        # Properly encode the path_info using UTF-8
        path = quote(request.path)
        request.path = path

    return app


my_app = get_app

start(my_app)

I tried the above, but the error gets thrown before this function gets called. The function gets called fine on non-breaking URLs.

louisdeb
  • 360
  • 4
  • 16
  • https://github.com/pallets/werkzeug/issues/378 suggests this problem should not occur? – louisdeb Jul 19 '23 at 15:07
  • https://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/ nice article but no solution – louisdeb Jul 19 '23 at 15:30
  • The issue seems to be with the environment. Best way for me would be to create the same docker container locally (using the image that is used in the AWS Lambda (some kind of Amazon Linux)) and test/debug there. – v100ev Jul 19 '23 at 16:12
  • Thanks for the reply. I'm currently trying to set `binary_media_types` to `["*/*"]` for my apigw – louisdeb Jul 19 '23 at 16:16
  • Yep also tried `self.http_api.node.default_child.binary_media_types = ["*/*"]` to no avail – louisdeb Jul 19 '23 at 16:37
  • Possible solutions using an Authorizer for the APIGW or an Edge Lambda? – louisdeb Jul 19 '23 at 18:27

0 Answers0