1

I have the following destination, image and destination_image tables:

(MYSQL):

  • destination: id_destination
  • image: id_image, filePath, size
  • destination_image: id_destination, id_image

I need to select random N images of a given size, per destination; where N is defined at runtime and destinations are selected from a range of id_destination.

I don't need top N images, just random N images.

Can someone point me in the right direction of what would be the correct SQL query for this ?

Fenton
  • 241,084
  • 71
  • 387
  • 401
brainydexter
  • 19,826
  • 28
  • 77
  • 115
  • 1
    possible duplicate of [Selecting random rows with MySQL](http://stackoverflow.com/questions/6541644/selecting-random-rows-with-mysql) – Alex K. Jun 18 '12 at 13:58
  • do you want a specific size and destination and "N" random? OR... do you want "N" random entries for all distinct destination / sizes available... two totally different queries... – DRapp Jun 18 '12 at 14:33
  • I apologize for the confusion by the word 'random' here. I wanted to emphasize the fact that I don't want `top` images here. (Edited the question now). – brainydexter Jun 18 '12 at 17:52
  • @DRapp First choice: I want N random images per destination of a given size. – brainydexter Jun 18 '12 at 17:56

2 Answers2

1

Try something like ORDER BY RAND() LIMIT N where N is the number of results that should be returned. For example:

SELECT i.id_image FROM image i
JOIN destination_image di ON i.id_image = di.id_image
WHERE di.id_destination > N AND di.id_destination < M
AND i.size = X
ORDER BY RAND() LIMIT Y

Replace N, M, X and Y with your values.

sp00m
  • 47,968
  • 31
  • 142
  • 252
  • I want N images per destination of a given size. I don't see how ORDER BY RAND() LIMIT N works here ? Can you please explain ? – brainydexter Jun 18 '12 at 17:58
1

If you want "x" entries per destination, you can do per MySQL variables... something like

select
      di.id_destination,
      di.id_image,
      i.filepath,
      i.size,
      @RandSeq := if( @lastDestination = di.id_destination, @RandSeq +1, 1 ) as FinalSeq,
      @lastDestination := di.id_destination as carryForward
   from
      destination_image di
         join image i
            on di.id_image = i.id_image,
      ( select @lastDestination := 0, 
               @RandSeq := 0 ) SQLVars
   where
      di.id_destination between rangeStart and rangeEnd (or similar criteria for your "range" )
   having
      FinalSeq = 4  ( just a sample, but your "N" entries per destination desired)
   order by
      id_destination,
      rand()

This will get 4 (or "N" in your case) entries PER Destination. The critical thing is the order by. The order by will return the records in this order FIRST, then apply the @sql variables. So, on a per destination, the rows will be RAND()omized first, but random WITHIN a single destination... and the same on the next destination, and the next... Then the @vars generate 1, 2, 3, etc for the column "FinalSeq". The "HAVING" clause restricts which rows to allow returning entries with that criteria in the final row, thus only 4 per this example.

CLARIFICATION OF SQL.

The @variables are almost like inline variables used in a SQL statement... by doing ( select @someVar := someValue, @anotherVar := '', @someDateVar := getdate() ) as SQLVars... basically just creates and initializes variables that can be set, changed and updated against every row the query is going to process... At the end of the SQL statement, the variables are then released. Some people pre-declare them as separate SET commands, THEN run the SQL-Select.. I prefer to do them in-line.

Now, how they are used... They can be used to track almost anything against any row being returned by the SQL-select statement, but if you ever need things in a certain order, the order clause is processed first, THEN passes the record on to be processed with the @vars... So, think of the @vars getting set within a program. They get set one in the order they are processed in the SQL command, and whatever the final result is, is stored into the final column name like any other similar function call, such as padding, trimming, upper, coalesce, etc.

Now, what is going on... lets take a look at the steps... Assume for sake of understanding, we have a single table, 10 rows corresponding 1=A, 2=B, 3=C --- 10=J. These are the natural order of records such as an auto-increment. Now, if you were to run a query select * from table order by rand(), you might get 3-C, 9-I, 2=B, 7=G, etc... This with a LIMIT 4, would only return the first 4 and you are done.

Now, take the same scenario of records 1-10 = A-J respectively. Now, lets expand to allow your "group" of multiple destinations and sizes, such as...

ID  Ltr   Dest   Size
1   A     X      a
2   B     Y      a
3   C     X      a
4   D     X      a
5   E     Y      b
6   F     X      c
7   G     Y      b
8   H     X      a
9   I     Y      a
10  J     X      b
11  K     X      a
12  L     Y      a
13  M     X      a
14  N     Y      a

Now, you want things of all destinations, but a single size "a" for example... I added a few extras.

select * from SampleTable where Size="a" order by rand()

You could get all "X" destinations, or the "Y(a)" records and "X(a)" records, or other similar balance of available. However, no guarantee that you get 3 "X" and 3 "Y" records of size "a". If you change the ORDER to ORDER BY Dest, rand(), this will put all the "X" entries first, THEN "Y"s which a limit won't work... So, apply the principle of my select, you order by the destination and random, and apply a where clause of one Size = "a" and you might get... (by destination first, then randomized...

13  M     X      a
3   C     X      a
8   H     X      a
1   A     X      a
11  K     X      a
4   D     X      a

9   I     Y      a
14  N     Y      a
2   B     Y      a
12  L     Y      a

Now, apply the @variables to the mix... Just applying the @RandSeq and @lastDestination

@RandSeq := if( @lastDestination = di.id_destination, @RandSeq +1, 1 ) as FinalSeq, @lastDestination := di.id_destination as carryForward

                        Start Value of 
ID  Ltr   Dest   Size   @RandSeq  @lastDestination  FinalSeq  carryForward
13  M     X      a      0         ''                1         X (current record value of dest)
3   C     X      a      1         X                 2         X
8   H     X      a      2         X                 3         X
1   A     X      a      3         X                 4         X
11  K     X      a      4         X                 5         X
4   D     X      a      5         X                 6         X

9   I     Y      a      6         X                 1         Y (change to Y resets counter to 1)
14  N     Y      a      1         Y                 2         Y
2   B     Y      a      2         Y                 3         Y
12  L     Y      a      3         Y                 4         Y

Now, if you apply a "HAVING" clause FinalSeq <= 3, you will get all the rows listed above with FinalSeq <= 3, and the remaining 4, 5, 6 of destination "X" will be ignored, and 4 of "Y" will be ignored... thus leaving 3 from each destination of the given size.

Hopefully this clarifies what's going on with the query.

DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Thanks for the explanation. Can you elaborate a bit on how @vars work here and ` ( select @lastDestination := 0, @RandSeq := 0 ) SQLVars ` ?? Also, I'm sorry for the confusion, but I am just looking for N records. Random word caused a lot of confusion. See my comment on the question. (wanted to emphasize on any records, and am not looking for `top` records) – brainydexter Jun 18 '12 at 18:03
  • @brainydexter, see revision for clarification... should explain a lot for you. – DRapp Jun 19 '12 at 01:57
  • Thanks for the explanation. I really appreciate it. I'm still digesting it though. 1 question: Is this how you release the SQL variables: `( select @lastDestination := 0, @RandSeq := 0 ) SQLVars` ?? – brainydexter Jun 19 '12 at 05:11
  • 1
    Also, check out the fiddle: http://sqlfiddle.com/#!2/4d41d/8. I seeded some sample data. When I uncomment `having FinalSeq <= 3` it just returns me 2 results per destination whereas it should return 3 results. Can you help me understand why it only returns 2 results ? – brainydexter Jun 19 '12 at 05:59
  • Since this will be a frequent query in my system, and the `destination_image` table is going to grow and is going to be huge. Can you also add some comments about the performance of this query ? If creating FinalSeq is an expensive operation (per each query), perhaps I should just add another field, analogous to `FinalSeq`, to destination_image table. What do you recommend ? – brainydexter Jun 19 '12 at 06:02
  • I have another concern about this. MySQL page on user variables warns about using this with aliases: http://dev.mysql.com/doc/refman/5.0/en/user-variables.html – brainydexter Jun 19 '12 at 07:00
  • @brainydexter, I just went to your SQLFiddle, uncommented the rand() order by, and the having clause and received 6 records... 3 for destination 1, 3 for destination 2... The "carryForward" only has 1 and 2 which represent the destination that just happened for comparison on next record basis. As for performance, I don't think you'll have to worry too much... if you want it randomized, just ensure you have an index on the elements you would group by and have a WHERE clause applied... – DRapp Jun 19 '12 at 11:23
  • interesting. I don't understand why it worked when I uncommented the `order by rand()`. Can you please explain why this is happening ? Is there a way around to not using `rand()` ? – brainydexter Jun 19 '12 at 12:52
  • @brainydexter, anomoly??? if not using the rand(), change to order by id_destination, FinalSeq. My guess is that the randomness is putting the 2 record AFTER the 4 record in the set, and as soon as that "order/having" hit a value greater than 3, it went to the next entry. By forcing the order by sequentially (either by Rand(), or FinalSeq column, you are forcing the final order of qualified entries. – DRapp Jun 19 '12 at 13:17