0

I have a database with tree tables,

person: id, bio, name
book: id, id_person, title, info
file: id, id_book, location

Other information: Book is about ~50,000 rows, File is about ~ 300,000 rows.

What I'm trying to do is to select 12 different authors and select just one book and from that book select location from the table file.

What I tried is the following:

SELECT DISTINCT(`person`.`id`), `person`.`name`, `book`.`id`, `book`.`title`, `book`.`info`, `file`.`location`
FROM `person`
INNER JOIN `book`
ON `book`.`id_person` = `person`.`id`
INNER JOIN `file`
ON `file`.`id_book` = `book`.`id`
LIMIT 12

I have learned that the DISTINCT does not work the way one might expect. Or is it me that I'm missing something? The above code returns books from the same author and goes with the next one. Which is NOT what I want. I want 1 book from each one of the 12 different authors.

What would be the correct way to retrieve this information from the database? Also, I would want to retrieve 12 random people. Not people that are stored in consecutive order in the database,. I could not formulate any query with rand() since I couldn't even get different authors.

I use MariaDB. And I would appreciate any help, especially help that allows to me do this with great performance.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
gglasses
  • 826
  • 11
  • 30
  • http://stackoverflow.com/questions/4329396/mysql-select-10-random-rows-from-600k-rows-fast - The same logic should work for your MariaDB, query language differences aside. – SArnab Jul 31 '15 at 02:40

1 Answers1

1

In MySQL, you can do this, in practice, using GROUP BY

SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`, f.`location`
FROM `person` p INNER JOIN
     `book` b
     ON b.`id_person` = p.`id` INNER JOIN
     `file` f
     ON f.id_book = b.id
GROUP BY p.id
ORDER BY rand()
LIMIT 12;

However, this is not guaranteed to return the non-id values from the same row (although it does in practice). And, although the authors are random, the books and locations are not.

The SQL Query to do this consistently is a bit more complicated:

SELECT p.`id`, p.`name`, b.`id`, b.`title`, b.`info`,
      (SELECT f.location
       FROM file f
       WHERE f.id_book = b.id
       ORDER BY rand()
       LIMIT 1
      ) as location
FROM (SELECT p.*,
             (SELECT b.id
              FROM book b
              WHERE b.id_person = p.id 
              ORDER BY rand()
              LIMIT 1
             ) as book_id
      FROM person p
      ORDER BY rand()
      LIMIT 12
     ) p INNER JOIN
     book b
     ON b.id = p.book_id ;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you. Works like a charm. But now I need to consider caching the outcome because in my database takes 0.5 seconds to come up with the 12 rows! Anyway, thank you. I learned a lot from your answer. – gglasses Jul 31 '15 at 16:53
  • @gglasses . . . There are faster ways to get random records than `order by rand() limit 12`. An easy way is to add a where clause, something like `where rand() < 0.01`. However, the appropriate value depends on the data. – Gordon Linoff Jul 31 '15 at 17:08
  • What calls my attention is that this `query` to select random rows is very usual in websites. I didn't think it could be that slow. Anyway, I'll try it with a `where` condition as you suggested. This `problem` makes me want to dig deep in databases. – gglasses Jul 31 '15 at 17:17