1

I am building a flask api that allows users to pass an xml and a transformation that returns the xml on which the transformation is performed using Saxon/C's python API (https://www.saxonica.com/saxon-c/doc/html/saxonc.html).

The incoming endpoint looks like this (removed logging and irrelevant info):

@app.route("/v1/transform/", methods=["POST"])
def transform():
    xml = request.data
    transformation = request.args.get("transformation")

    result = transform_xml(xml, transformation)

    return result

The transform function looks like this:

    def transform_xml(xml: bytes, transformation: str) -> str:
        with saxonc.PySaxonProcessor(license=False) as proc:
            base_dir = os.getcwd()
            xslt_path = os.path.join(base_dir, "resources", transformation, "main.xslt")

            xslt_proc = proc.new_xslt30_processor()

            node = proc.parse_xml(xml_text=xml.decode("utf-8"))

            result = xslt_proc.transform_to_string(stylesheet_file=xslt_path, xdm_node=node)
            
            return result

The xslt's are locally available and a user should choose one of the available ones by passing the corresponding transformation name.

Now the problem is, this works (fast) for the first incoming call, but the second one crashes:

JNI_CreateJavaVM() failed with result: -5
DAMN ! worker 1 (pid: 517095) died :( trying respawn ...

What does work is changing the transform_xml function like this:

        proc = saxonc.PySaxonProcessor(license=False)

        xslt_path = self.__get_path_to_xslt(transformation)

        xslt_proc = proc.new_xslt30_processor()

        node = proc.parse_xml(xml_text=xml.decode("utf-8"))

        result = xslt_proc.transform_to_string(stylesheet_file=xslt_path, xdm_node=node)
        
        return result

But this leads to the resources never getting released and over time (1k+ requests) this starts to fill up the memory.

It seems like Saxon is trying to create a new VM while the old one is going down.

I found this thread from 2016: https://saxonica.plan.io/boards/4/topics/6399 but this didn't clear it up for me. I looked at the github for the pysaxon repo, but I have found no answer to this problem. Also made a ticket at Saxon: https://saxonica.plan.io/issues/4942

Korfoo
  • 571
  • 2
  • 12
  • https://saxonica.plan.io/issues/4428 might be relevant but I have idea whether it resolves your issue. – Martin Honnen Mar 18 '21 at 20:06
  • Hmm, I don't think the threading is an issue. I have configured my flask api with 1 worker and 1 process. Thanks for the link anyway :) – Korfoo Mar 18 '21 at 21:07
  • When the @PySaxonProcessor@ class is used as the context the release method is called, which will clear up Jet VM runtime. I see your last examples will work since the `PySaxonProcessor` object is not created as a context. Your first example should work given you have only 1 worker and 1 process. I will investigate this further. What is your environment? Is this program behind a web server? – ond1 Mar 19 '21 at 09:35
  • The same problem happens when manually calling the release function before returning the result in the second example by the way. I am using python 3.8.6 on Ubuntu 20.10. The flask app is run using uWSGI with the following config: ```[uwsgi] module = wsgi:app master = true processes = 1 workers = 1 http = 0.0.0.0:8080 vacuum = true die-on-term = true ``` – Korfoo Mar 19 '21 at 10:04
  • 1
    I have made a very small application showcasing the problem with as little code as possible. https://github.com/RudolfDG/saxon-flask-api – Korfoo Mar 19 '21 at 11:08
  • Still investigating this issue. Responded here https://saxonica.plan.io/issues/4942 – ond1 Mar 22 '21 at 20:58
  • 1
    This bug issue has now been resolved and will be in the next maintenance release. Also see: https://saxonica.plan.io/issues/5373 – ond1 Mar 16 '22 at 12:12

0 Answers0