1

I have a big user table (3M), of structure:

age  (int) 
country (FK) 
city (text) 
gender (text) 
active (bool)

My objective is to run a search finding 15 users of given age area, gender, city and country (favouring active users).

I'm having two issues, due to the size of the table:

  • Even with indexing, it takes a long time (a couple hundred miliseconds)

  • Results must differ each time the query is ran. Currently I execute query with LIMIT 30 and just get random 15 results from it, but the 30 results are always the same. Whole table is too big to do shuffling.

Is there some commonly used solution for overcoming similar issues?

The database server is MySQL, implementation in Django. I also have access to Redis cache.

George
  • 497
  • 5
  • 15

1 Answers1

0

You can just do the query with different limits and offsets. From the docs

You'll just need to keep track of where you are.

# My example search criteria.
age = 21
gender = 'M'
city = 'Redmond'
country = 'US'

offset = 0
while True:
    next_fifteen = User.objects.filter(age=age,
                                       gender=gender,
                                       city=city,
                                       country=country)[offset:offset+15]
    if not next_fifteen.exists():
        break
    <do whatever>
    offset += 15

This will create a SQL statement that uses OFFSET 45 LIMIT 15

You can see sites that do something similar. Such as google results. Go to the second or third page and you'll notice a url parameter &start=60. Not implying I know how google operates, but it's the same basic principle.

Stack of Pancakes
  • 1,881
  • 18
  • 23
  • Thanks that's interesting. But in this case eg. to get to the 150th user I'd need to run 10 queries right? – George Oct 22 '14 at 06:31
  • It was just an example. You can do a single query if you'd like. `User.objects.filter(filterystuff=otherstuff)[150:165]` Was just showing how you can use offsets to get subsections of records. You can either keep track of your offset or offer sequential results or... anything really. – Stack of Pancakes Oct 23 '14 at 21:47
  • I've tried that, but increasing offset seem to slow down the query linearly and is unacceptably slow for offset of about 200 – George Oct 23 '14 at 21:57
  • Of course it does. `LIMIT` needs to gather a number of records equal to the extent of the limit. A limit of 15 can get 15 records and then stop searching. A limit of 15 with an offset of 100 would have to get 100 records first, then return the next 15. I don't know what "unacceptably slow" is though. Sounds like you might need to spend some time structuring your database for speed in relation to specific queries that will be performed often. Maybe this will help. http://stackoverflow.com/questions/1243952/how-can-i-speed-up-a-mysql-query-with-a-large-offset-in-the-limit-clause – Stack of Pancakes Oct 24 '14 at 03:58