2

I have a problem on retrieving the data from database, I'm using Mysql database and one table which contains 15 fields which I'm going to display it on template in a table.

It has more than 4 million records in the table,

def index(request):
    args = {}
    data = Mymodel.objects.all()
    args['data'] = data
    return render(request, 'index.html', args)

I tried this way but the data loads very very slowly,

and next approach I tried for database connections

def index(request):
    args = {}
    sql = "select * from mymodel"
    cursor = connection.cursor()
    cursor.execute(sql)
    result = cursor.fetchall()
    args['result'] = result
    return render(request, 'index.html', args)

This also does the same, I need to load the data much faster. Please suggest to me any approach that would load the data faster.

Mark Chackerian
  • 21,866
  • 6
  • 108
  • 99
Ajay Kumar
  • 1,595
  • 3
  • 20
  • 36
  • 3
    You're trying to show 4 million records, its never going to be fast, you need to decide what you actually *need* to show – Sayse May 12 '16 at 14:03
  • Please take a look at Django's Pagination API: https://docs.djangoproject.com/en/1.9/topics/pagination/ –  May 12 '16 at 14:16
  • Pagination is the best way there is to speed this up. Querying and then trying to display 4 million records is not advisable. – formatkaka May 12 '16 at 14:31

3 Answers3

1

One concept correction before the solution is that using raw sql doesn't do any better than django ORM and you also lost the ability to use ORM api to display the data easily.

The common solution is to use django built in pagination to show only certain amount of records per page. Django doc has very detailed explanation about the usage of pagination.

Shang Wang
  • 24,909
  • 20
  • 73
  • 94
  • "... using raw sql doesn't do any better than django ORM" The ORM does add some overhead (especially memory overhead), which is normally an acceptable trade-off, but can be a problem in these situations. – fabio.sussetto May 12 '16 at 14:03
  • 3
    @fabio.sussetto - What overhead are you talking about? its not on the database side since it comes down to the same query and it can't be on the retrieval of results since you need to do that in the same way too. So if you're talking about the creation of the sql query then thats nonsense since it is not intensive and definitely not intensive for an `objects.all()`. I've yet to find a need for a raw query. – Sayse May 12 '16 at 14:07
  • 1
    Having to create all the model instance objects does have an overhead, as opposed to return tuples using cursor.fetchall() – fabio.sussetto May 12 '16 at 14:10
  • @fabio.sussetto Maybe my wordings are a little absolute, but I'm trying to tell OP that the overhead is so tiny compare to other operations that it can be neglected, hence there's not point trying that. – Shang Wang May 12 '16 at 14:15
  • @ShangWang I agree with you, the OP idea is almost a lost cause to start with IMO :) My point was just to clarify that if you're trying to squeeze out performances, in some cases bypassing the ORM could be your only solution. – fabio.sussetto May 12 '16 at 14:19
  • @fabio.sussetto - Sure (and I apologise if I was a bit harsh), but then you lose any time advantage you had when you start to get data out of that tuple so you are no better off. Not to mention the time increase on maintenance of the code etc – Sayse May 12 '16 at 14:25
  • @Sayse absolutely, no worries no harshness taken :) As usual these kind of optimisations come at the expense of convenience. – fabio.sussetto May 12 '16 at 14:52
1

Since you're already executing a raw query, I don't think you'll be realistically able to do much better than what you have.

Is there any reason why you can't paginate the results? Normally you never return all the data available.

You can try and use a QuerySet.iterator to avoid having to load all your instances in memory at once. Also, QuerySet.values would return dictionaries instead of full-blown model instances, reducing memory usage.

If you absolutely need to and you get into request timeouts, one option is to run the actual data retrieval in the background. For example, you could use Celery and run the query as part of a task. Your view would trigger the Celery task and return a task identifier in the response. The consumer could then poll a "job status" endpoint passing such task id and retrieve the results when they are ready. You'd still need to store the results somewhere else (faster) than your db, i.e. Redis (which you can easily use as a Celery backend).

This approach is obviously more convoluted and would add a few moving parts to your system and you're likely to get into other issues anyway (e.g. the response size). Using pagination if possible would be a lot easier.

fabio.sussetto
  • 6,964
  • 3
  • 25
  • 37
1

You can use pagination if you want your system to work for thousands of records. Django genaric ListView will help you in this case. They are easy to use. They works like this

class YourView(ListView):
    template_name = 'books/acme_list.html'
    context_object_name = 'object_list'
    queryset = TableName.objects.all()
    paginate_by = 50

Your template will be like this

<table id="exam">
  {% for object in object_list %}
  <tr>
    <td>{{ object }}</td>
  </tr>
  {% endfor %}
</table>

And your paginate section will be like this

{% if is_paginated %}
  <ul class="pagination">
    {% if page_obj.has_previous %}
      <li>
         <span><a href="?page={{ page_obj.previous_page_number }}">Previous</a></span>
     </li>
    {% endif %}
      <li class="">
        <span>Page {{ page_obj.number }} of {{ page_obj.paginator.num_pages }}.</span>
      </li>
    {% if page_obj.has_next %}
      <li>
        <span><a href="?page={{ page_obj.next_page_number }}">Next</a></span>
      </li>
    {% endif %}
      </ul>
{% endif %}

You can find further detail in this link and Django documentation.

Community
  • 1
  • 1
Muhammad Hassan
  • 14,086
  • 7
  • 32
  • 54