0

Django takes way too much CPU time to send response to localhost using the development server (same thing happens with gunicorn but a worker timeout occurs). The response size shouldn't be more than 15KB so the content size isn't the problem. I am using a virtual environment (venv) on Ubuntu 22.04.

My code is the following:

def time_series(request: HttpRequest) -> JsonResponse | HttpResponseBadRequest:
    """
    This view returns a set of time series based on the user's query.

    Args:
        request (HttpRequest): The request object passed by django.

    Returns:
        JsonResponse | HttpResponseBadRequest: An object containing the time series or an error response.
    """

    # Check this StackOverflow thread to see how group by works in Django:
    # https://stackoverflow.com/a/629691/11718554

    if (patents := get_patents(request)) is None:
        return HttpResponseBadRequest("No patent query in the current session.")

    print(len(get_patent_ids(request)))

    applications_per_year = list(
        patents.values("application_year")
        .annotate(count=Count("id"))
        .order_by("application_year")
        .values("application_year", "count")
    )
    granted_patents_per_year = list(
        patents.values("granted_year")
        .annotate(count=Count("id"))
        .order_by("granted_year")
        .values("granted_year", "count")
    )
    granted_patents_per_type_year = list(
        patents.values("granted_year", "type")
        .annotate(count=Count("id"))
        .order_by("granted_year", "type")
        .values("granted_year", "type", "count")
    )
    granted_patents_per_office_year = list(
        patents.values("granted_year", "office")
        .annotate(count=Count("id"))
        .order_by("granted_year", "office")
        .values("granted_year", "office", "count")
    )
    pct_protected_patents_per_year = list(
        patents.filter(pct_data__granted=True)
        .values("granted_year")
        .annotate(count=Count("id"))
        .order_by("granted_year")
        .values("granted_year", "count")
    )
    granted_patents_per_cpc_year = list(
        patents.annotate(cpc_section=Substr("cpc_groups__cpc_group", 1, 1))
        .values("granted_year", "cpc_section")
        .annotate(count=Count("id"))
        .order_by("granted_year", "cpc_section")
        .values("granted_year", "cpc_section", "count")
    )
    citations_made_per_year = list(
        patents.values("citations__citation_year")
        .annotate(count=Count("id"))
        .order_by("citations__citation_year")
        .values("citations__citation_year", "count")
    )
    citations_received_per_year = list(
        patents.values("cited_by__citation_year")
        .annotate(count=Count("id"))
        .order_by("cited_by__citation_year")
        .values("cited_by__citation_year", "count")
    )

    return MyJsonResponse(
        {
            "applications_per_year": {
                "": group_fields(applications_per_year, ["application_year"])
            },
            "granted_patents_per_year": {
                "": group_fields(granted_patents_per_year, ["granted_year"])
            },
            "granted_patents_per_type_year": group_fields(
                granted_patents_per_type_year, ["type", "granted_year"]
            ),
            "granted_patents_per_office_year": group_fields(
                granted_patents_per_office_year, ["office", "granted_year"]
            ),
            "pct_protected_patents_per_year": {
                "": group_fields(pct_protected_patents_per_year, ["granted_year"])
            },
            "granted_patents_per_cpc_year": group_fields(
                granted_patents_per_cpc_year, ["cpc_section", "granted_year"]
            ),
            "citations_made_per_year": {
                "": group_fields(citations_made_per_year, ["citations__citation_year"])
            },
            "citations_received_per_year": {
                "": group_fields(
                    citations_received_per_year, ["cited_by__citation_year"]
                )
            },
        }
    )

Where MyJsonResponse is a wrapper of HttpResponse I wrote to check if the CPU time is spent on encoding and looks like like that:

class MyJsonResponse(HttpResponse):
    def __init__(self, data, status=None):
        content = json.dumps(data)
        print("done")
        super().__init__(content, content_type="application/json", status=status)

I profiled the code using django-debug-toolbar and the results are the following:

Resource     Value
User CPU time    46030.691 msec
System CPU time  258.829 msec
Total CPU time   46289.520 msec
Elapsed time     62882.663 msec
Context switches 23463 voluntary, 667 involuntary

17202.20 ms (9 queries from 1 connection)

Call    CumTime     Per     TotTime     Per     Count
/debug_toolbar/panels/__init__.py in process_request(196)
    17.619  17.619  0.000   0.000   1
/django/core/handlers/exception.py in inner(53)
    17.619  17.619  0.000   0.000   1
/django/core/handlers/base.py in _get_response(174)
    17.619  17.619  0.000   0.000   1
/home/kostas/programming/patent-analysis/main/api.py in time_series(334)
    17.619  17.619  0.000   0.000   1
/home/kostas/programming/patent-analysis/main/helpers.py in group_fields(88)
    0.003   0.000   0.001   0.000   8
/home/kostas/programming/patent-analysis/main/helpers.py in __init__(213)
    0.000   0.000   0.000   0.000   1
/home/kostas/programming/patent-analysis/main/api.py in get_patent_ids(102)
    0.001   0.001   0.000   0.000   1
/home/kostas/programming/patent-analysis/main/api.py in get_patents(117)
    0.067   0.067   0.000   0.000   1
/django/db/models/query.py in __iter__(379)
    17.543  2.193   0.000   0.000   8 

I noticed clearly that the 46s CPU time was after the 'done' message was printed in the MyJsonResponse wrapper. What's going on here?

Thank you in advance.

  • QuerySets are *lazy* that is why it will do the query work there: as long as nothing forces a queryset to be evaluated, it will not evaluate. – Willem Van Onsem Aug 08 '23 at 11:36
  • The list() function surely forces them to be evaluated, or even the json.dumps() function. I can even print the json string and then wait +45s for no reason for the response to be sent. – Constantinos Petrakis Aug 08 '23 at 11:43

1 Answers1

0

After disabling django-debug-toolbar the 45s overhead was gone. So lesson learned: never disrespect the overhead of profiling tools.