1

I'm looking for suggestions on how to go about handling the following use case scenario with python django framework, i'm also open to using javascript libraries/ajax.

I'm working with pre-existing table/model called revenue_code with over 600 million rows of data.

The user will need to search three fields within one search (code, description, room) and be able to select multiple search results similar to kendo controls multi select. I first started off by combining the codes in django-filters as shown below, but my application became unresponsive, after waiting 10-15 minutes i was able to view the search results but couldn't select anything.

https://simpleisbetterthancomplex.com/tutorial/2016/11/28/how-to-filter-querysets-dynamically.html

I've also tried to use kendo controls, select2, and chosen because i need the user to be able to select as many rev codes as they need upward to 10-20, but all gave the same unresponsive page when it attempted to load the data into the control/multi-select.

Essentially what I'm looking for is something like this below, which allows the user to select multiple selections and will handle a massive amount of data without becoming unresponsive? Ideally i'd like to be able to query my search without displaying all the data.

https://petercuret.com/add-ajax-to-django-without-writing-javascript/

Is Django framework meant to handle this type of volume. Would it be better to export this data into a file and read the file? I'm not looking for code, just some pointers on how to handle this use case.

user1470034
  • 671
  • 2
  • 8
  • 23
  • you could use a search engine like solr or elasticsearch. The idea behind these is to biuld a search index so you wont have to query the db so often/heavily. Pre filtering/ sorting also sounds like an idea when you have 600m records. – hansTheFranz Jan 26 '18 at 09:39

2 Answers2

2

What the basic mechanism of "searching 600 millions"? Basically how database do that is to build an index, before search-time, and sufficiently general enough for different types of query, and then at search time you just search on the index - which is much smaller (to put into memory) and faster. But no matter what, "searching" by its nature, have no "pagination" concept - and if 600 millions record cannot go into memory at the same time, then multiple swapping out and in of parts of the 600 millions records is needed - the more parts then the slower the operation. These are hidden behind the algorithms in databases like MySQL etc.

There are very compact representation like bitmap index which can allow you to search on data like male/female very fast, or any data where you can use one bit per piece of information.

So whether Django or not, does not really matters. What matters is the tuning of database, the design of tables to facilitate the queries (types of indices), and the total amount of memory at server end to keep the data in memory.

Check this out:

https://dba.stackexchange.com/questions/20335/can-mysql-reasonably-perform-queries-on-billions-of-rows

https://serverfault.com/questions/168247/mysql-working-with-192-trillion-records-yes-192-trillion

How many rows are 'too many' for a MySQL table?

Peter Teoh
  • 6,337
  • 4
  • 42
  • 58
1

You can't load all the data into your page at once. 600 million records is too many.

Since you mentioned select2, have a look at their example with pagination.

The trick is to limit your SQL results to maybe 100 or so at a time. When the user scrolls to the bottom of the list, it can automatically load in more.

Send the search query to the server, and do the filtering in SQL (or NoSQL or whatever you use). Database engines are built for that. Don't try filtering/sorting in JS with that many records.

mpen
  • 272,448
  • 266
  • 850
  • 1,236