Here is my current database structure:
CREATE TABLE `books` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`title` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
`year` year(4) NOT NULL DEFAULT '0000',
`author` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
UNIQUE KEY `title` (`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
CREATE TABLE `chapters` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`book_id` int(10) unsigned NOT NULL DEFAULT '0',
`number` int(10) unsigned NOT NULL DEFAULT '0',
`title` varchar(100) COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
PRIMARY KEY (`id`),
KEY `book_id` (`book_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
ALTER TABLE `chapters`
ADD CONSTRAINT `chapters_ibfk_1` FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
CREATE TABLE `pages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`book_id` int(10) unsigned NOT NULL DEFAULT '0',
`chapter_id` int(10) unsigned NOT NULL DEFAULT '0',
`number` int(10) unsigned NOT NULL DEFAULT '0',
`text` text COLLATE utf8_unicode_ci NOT NULL,
`words` int(10) unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`id`),
KEY `book_id` (`book_id`),
KEY `chapter_id` (`chapter_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=1;
ALTER TABLE `pages`
ADD CONSTRAINT `pages_ibfk_1` FOREIGN KEY (`book_id`) REFERENCES `books` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
ADD CONSTRAINT `pages_ibfk_2` FOREIGN KEY (`chapter_id`) REFERENCES `chapters` (`id`) ON DELETE CASCADE ON UPDATE CASCADE;
The structure is quite simple... basically I'm extracting book text page by page and storing everything into my database, which is organized into a book>chapter>page system. I tried to make it as flexible as possible so I can easily aggregate data on the point of view of the whole book or by chapter... but if you think I could have made something better I'm open to any suggestion!
Now, I would like to allow users to perform keyword searches inside the books... so that they could search for all the occurrences of a single word, or even a phrase, inside the book they choose from a dropdown.
My web server is not located on the same machine that stores the MySQL database (technical issue that I cannot get rid of in the short run)... so in order to avoid huge data traffic between the two machines I would prefer to run the text searches through SQL queries. Retrieving all the pages and analyzing them with PHP would translate into 5-10 Mb of data every time.
Now my questions are:
- Is it possible to perform this kind of process using only query commands (
LIKE
,MATCH
,REPLACE
, etc...)? - I would to obtain results formatted by page in the following way: [page 1 | 0 occurrences], [page 2 | 1 occurrence], [page 3 | 1 occurrence], [page 4 | 2 occurrences]... is that possible?
- Do you think it would be a good idea to strip spacing characters (line breaks, tabs and such) and punctuation characters from the pages text before storing it into the field
pages.text
?
Thanks for your help!