1

I'm working on a web app in Python (Flask) that, essentially, shows the user information from a PostgreSQL database (via Flask-SQLAlchemy) in a random order, with each set of information being shown on one page. Hitting a Next button will direct the user to the next set of data by replacing all data on the page with new data, and so on.

My conundrum comes with making the presentation truly random - not showing the user the same information twice by remembering what they've seen and not showing them those already seen sets of data again.

The site has no user system, and the "already seen" sets of data should be forgotten when they close the tab/window or navigate away.

I should also add that I'm a total newbie to SQL in general.

What is the best way to do this?

codegeek
  • 32,236
  • 12
  • 63
  • 63
Joshua Merriman
  • 925
  • 1
  • 8
  • 13
  • @MarkRichman - I don't think this is a duplicate, as I'm looking to get a random row that the user hasn't already seen, rather than just any random row. I'm also using SQLAlchemy, as opposed to actual SQL, to perform the queries – Joshua Merriman Nov 19 '13 at 13:14
  • 1
    _My conundrum comes with making the presentation truly random - not showing the user the same information twice by remembering what they've seen and not showing them those already seen sets of data again._ Just a nitpick: by definition, you are making your data LESS random by doing that. Perhaps that IS what you need your application to do, but you are throwing people off when you say you are trying to make things _truly_ random and then describe something that is _less_ random than just picking any element. – Mark Hildreth Nov 19 '13 at 14:34
  • As @MarkHildreth said, what you are looking for isn't truly random. What I would ask then is if do you need as complicated a solution for your use case. First, how are you indexing data? If it is the usual auto-index starting with 1, you could just store your latest (aka highest index), and do a random number in that range. Example: `random_id = str(random.randint(1, latest_id))` then `SELECT * FROM data WHERE id = random_id`. Which would be a bit faster than 'order by RAND'. I believe you could query the latest id when flask started and store it in memory, and then update it when needed. – Seberius Nov 19 '13 at 18:04
  • 1
    The next question would if you do need to remember what a user has already seen. If your data set is sufficiently large, you may not have to worry about storing which data your users have already seen. For example, if you have over 1,000 items and your user on average looks at 50 in a single visit, then the highest chance of them seeing a repeat on each next click is less than 5%. If some of your users look through far more, you may also consider giving them different orders to look through data (such as newest, oldest, most popular, etc) in those cases. – Seberius Nov 19 '13 at 18:18

2 Answers2

1

The easiest way is to do the random number generation in javascript at the client end...

Tell the client what the highest number row is, then the client page keeps track of which ids it has requested (just a simple js array). Then when the "request next random page" button is clicked, it generates a new random number less than the highest valid row id, and providing that the number isn't in its list of previously viewed items, it will send a request for that item.

This way, you (on the server) only have to have 2 database accessing views:

  1. main page (which gives the js, and the highest valid row id)
  2. display an item (by id)

You don't have any complex session tracking, and the user's browser is only having to keep track of a simple list of numbers, which even if they personally view several thousand different items is still only going to be a meg or two of memory.

For performance reasons, you can even pre-fetch the next item as soon as the current item loads, so that it displays instantly and loads the next one in the background while they're looking at it. (jQuery .load() is your friend :-) )

If you expect a large number of items to be removed from the database (so that the highest number is not helpful), then you can instead generate a list of random ids, send that, and then request them one at a time. Pre-generate the random list, as it were.

Hope this helps! :-)

Daniel Fairhead
  • 1,103
  • 9
  • 9
0

You could stick the "already seen" data in a session cookie. Selecting random SQL data is explained here

Community
  • 1
  • 1
Mark Richman
  • 28,948
  • 25
  • 99
  • 159
  • 1
    This will fail on large datasets and/or users viewing lots of pages. First of all, the 'order by RAND()' solutions are very slow because they require entire table scans; and furthermore, cookies are limited to 4K in size, so depending on the size of the keys you store, you can probably only store about 1000 'already seen' keys. – publysher Nov 19 '13 at 13:17
  • With the rate I expect the app to scale at, there'll definitely be situations where I need to hold more than 1000 keys. – Joshua Merriman Nov 19 '13 at 13:28
  • 1
    The best way to do it in my opinion is to follow @Mark's advice, but with the database table which will hold session and visited items. No other way will be good in the long run – Paweł Pogorzelski Nov 19 '13 at 14:13