0

I'm working on a management system for a small library. I proposed them to replace the Excel spreadsheet they are using now with something more robust and professional like PhpMyBibli - https://en.wikipedia.org/wiki/PhpMyBibli - but they are scared by the amount of fields to fill, and also the interfaces are not fully translated in Italian.

So I made a very trivial DB, with basically a table for the authors and a table for the books. The authors table is because I'm tired to have to explain that "Gabriele D'Annunzio" != "Gabriele d'Annunzio" != "Dannunzio G." and so on.

My test tables are now populated with ~ 100k books and ~ 3k authors, both with plausible random text, to check the scripts under pressure.

For the public consultation I want to make an interface like that of Gallica, the website of the Bibliothèque nationale de France, which I find pretty useful. A sample can be seen here: http://gallica.bnf.fr/Search?ArianeWireIndex=index&p=1&lang=EN&f_typedoc=livre&q=Computer&x=0&y=0

The concept is pretty easy: for each menu, e.g. the author one, I generate a fancy <select> field with all the names retrieved from the DB, and this works smoothly.

The issue arises when I try to add beside every author name the number of books, as made by Gallica, in this way (warning - conceptual code, not actual PHP):

SELECT id, surname, name FROM authors
foreach row {
    SELECT COUNT(*) as num FROM BOOKS WHERE id_auth=id
    echo "<option>$surname, $name ($num)</option>";
}

With the code above a core of the CPU jumps at 100%, and no results are shown in the browser. Not surprising, since they are 3k queries on a 100k table in a very short time.

Just to try, I added a LIMIT 100 to the first query (on the authors table). The page then required 3 seconds to be generated, and 15 seconds when I raised the LIMIT to 500 (seems a linear increment). But of course I can't show to library users a reduced list of authors.

I don't know which hardware/software is used by Gallica to achieve their results, but I bet their budget is far above that of a small village library using 2nd hand computers.

Do you think that to add a "number_of_books" field in the authors table, which will be updated every time a new book is inserted, could be a practical solution, rather than to browse the whole list at every request?

BTW, a similar procedure must be done for the publication date, the language, the theme, and some other fields, so the query time will be hit again, even if the other tables are a lot smaller than the authors one.

Marco Bernardini
  • 695
  • 6
  • 17

2 Answers2

4

Your query style is very inefficient - try using a join and group structure:

SELECT 
  authors.id, 
  authors.surname, 
  authors.name,
  COUNT(books.id) AS numbooks
FROM authors
INNER JOIN books ON books.id_auth=authors.id
GROUP BY authors.id
ORDER BY numbooks DESC
;

EDIT

Just to clear up some issues I not explicitely said:

  • Ofcourse you don't need a query in the PHP loop any longer, just the displaying portion
  • Indices on books.id_auth and authors.id (the latter primary or unique) are assumed

EDIT 2

As @GordonLinoff pointed out, the IFNULL() is redundant in an inner join, so I removed it.

To get all themes, even if there aren't any books in them, just use a left join (this time including the IFNULL(), if your provider's MySQL may be old):

SELECT
  theme.id,
  theme.main,
  theme.sub,
  IFNULL(COUNT(books.theme),0) AS num
FROM themes
LEFT JOIN books ON books.theme=theme.id
GROUP BY themes.id
;

EDIT 3

Ofcourse a stored value will give you the best performance - but this denormalization comes at a cost: Your Database now has the potential to become inconsistent in a user-visible way. If you do go with this method. I strongly recommend you use triggers to auto-fill this field (and ofcourse those triggers must sit on the books table). Be prepared to see slowed down inserts - this might ofcourse be okay, as I guess you will see a much higher rate of SELECTS than INSERTS

Eugen Rieck
  • 64,175
  • 10
  • 70
  • 92
0

After reading a lot about how the JOIN statement works, with the help of useful answer 1 and useful answer 2, I discovered I used it some 15 or 20 years ago, then I forgot about this since I never needed it again.

I made a test using the options I had:

  • reply with the JOIN query with IFNULL(): 0,5 seconds
  • reply with the JOIN query without IFNULL(): 0,5 seconds
  • reply using a stored value: 0,4 seconds

That DB will run on some single core old iron, so I think a 20% difference could be significant, and I decide to use stored values, updating the count every time a new book is inserted (i.e. not often).

Anyway thanks a lot for having refreshed my memory: JOIN queries will be useful somewhere else in my DB.


update

I used the JOIN method above to query the book themes, which are stored into a far smaller table, in this way:

SELECT theme.id, theme.main, theme.sub, COUNT(books.theme) as num FROMthemesJOIN books ON books.theme = theme.id GROUP BY themes.id ORDER by themes.main ASC, themes.sub ASC

It works fine, but for themes which are not in the books table I obviously don't get a 0 response, so I don't have lines like Contemporary Poetry - Etruscan (0) to show as disabled options for the sake of list completeness.

Is there a way to have back my theme.main and theme.sub?

Marco Bernardini
  • 695
  • 6
  • 17