Django takes way too much CPU time to send response to localhost using the development server (same thing happens with gunicorn but a worker timeout occurs). The response size shouldn't be more than 15KB so the content size isn't the problem. I am using a virtual environment (venv) on Ubuntu 22.04.
My code is the following:
def time_series(request: HttpRequest) -> JsonResponse | HttpResponseBadRequest:
"""
This view returns a set of time series based on the user's query.
Args:
request (HttpRequest): The request object passed by django.
Returns:
JsonResponse | HttpResponseBadRequest: An object containing the time series or an error response.
"""
# Check this StackOverflow thread to see how group by works in Django:
# https://stackoverflow.com/a/629691/11718554
if (patents := get_patents(request)) is None:
return HttpResponseBadRequest("No patent query in the current session.")
print(len(get_patent_ids(request)))
applications_per_year = list(
patents.values("application_year")
.annotate(count=Count("id"))
.order_by("application_year")
.values("application_year", "count")
)
granted_patents_per_year = list(
patents.values("granted_year")
.annotate(count=Count("id"))
.order_by("granted_year")
.values("granted_year", "count")
)
granted_patents_per_type_year = list(
patents.values("granted_year", "type")
.annotate(count=Count("id"))
.order_by("granted_year", "type")
.values("granted_year", "type", "count")
)
granted_patents_per_office_year = list(
patents.values("granted_year", "office")
.annotate(count=Count("id"))
.order_by("granted_year", "office")
.values("granted_year", "office", "count")
)
pct_protected_patents_per_year = list(
patents.filter(pct_data__granted=True)
.values("granted_year")
.annotate(count=Count("id"))
.order_by("granted_year")
.values("granted_year", "count")
)
granted_patents_per_cpc_year = list(
patents.annotate(cpc_section=Substr("cpc_groups__cpc_group", 1, 1))
.values("granted_year", "cpc_section")
.annotate(count=Count("id"))
.order_by("granted_year", "cpc_section")
.values("granted_year", "cpc_section", "count")
)
citations_made_per_year = list(
patents.values("citations__citation_year")
.annotate(count=Count("id"))
.order_by("citations__citation_year")
.values("citations__citation_year", "count")
)
citations_received_per_year = list(
patents.values("cited_by__citation_year")
.annotate(count=Count("id"))
.order_by("cited_by__citation_year")
.values("cited_by__citation_year", "count")
)
return MyJsonResponse(
{
"applications_per_year": {
"": group_fields(applications_per_year, ["application_year"])
},
"granted_patents_per_year": {
"": group_fields(granted_patents_per_year, ["granted_year"])
},
"granted_patents_per_type_year": group_fields(
granted_patents_per_type_year, ["type", "granted_year"]
),
"granted_patents_per_office_year": group_fields(
granted_patents_per_office_year, ["office", "granted_year"]
),
"pct_protected_patents_per_year": {
"": group_fields(pct_protected_patents_per_year, ["granted_year"])
},
"granted_patents_per_cpc_year": group_fields(
granted_patents_per_cpc_year, ["cpc_section", "granted_year"]
),
"citations_made_per_year": {
"": group_fields(citations_made_per_year, ["citations__citation_year"])
},
"citations_received_per_year": {
"": group_fields(
citations_received_per_year, ["cited_by__citation_year"]
)
},
}
)
Where MyJsonResponse
is a wrapper of HttpResponse I wrote to check if the CPU time is spent on encoding and looks like like that:
class MyJsonResponse(HttpResponse):
def __init__(self, data, status=None):
content = json.dumps(data)
print("done")
super().__init__(content, content_type="application/json", status=status)
I profiled the code using django-debug-toolbar
and the results are the following:
Resource Value
User CPU time 46030.691 msec
System CPU time 258.829 msec
Total CPU time 46289.520 msec
Elapsed time 62882.663 msec
Context switches 23463 voluntary, 667 involuntary
17202.20 ms (9 queries from 1 connection)
Call CumTime Per TotTime Per Count
/debug_toolbar/panels/__init__.py in process_request(196)
17.619 17.619 0.000 0.000 1
/django/core/handlers/exception.py in inner(53)
17.619 17.619 0.000 0.000 1
/django/core/handlers/base.py in _get_response(174)
17.619 17.619 0.000 0.000 1
/home/kostas/programming/patent-analysis/main/api.py in time_series(334)
17.619 17.619 0.000 0.000 1
/home/kostas/programming/patent-analysis/main/helpers.py in group_fields(88)
0.003 0.000 0.001 0.000 8
/home/kostas/programming/patent-analysis/main/helpers.py in __init__(213)
0.000 0.000 0.000 0.000 1
/home/kostas/programming/patent-analysis/main/api.py in get_patent_ids(102)
0.001 0.001 0.000 0.000 1
/home/kostas/programming/patent-analysis/main/api.py in get_patents(117)
0.067 0.067 0.000 0.000 1
/django/db/models/query.py in __iter__(379)
17.543 2.193 0.000 0.000 8
I noticed clearly that the 46s CPU time was after the 'done' message was printed in the MyJsonResponse wrapper. What's going on here?
Thank you in advance.