Introduction
in my project I'm saving FacebookPages and their like count, as well as the like count per country. I have a table for the FacebookPages, one for the languages, one for the correlation between the facebook page and the language (and counting the likes) and one table which saves this data as a history. What I want to do, is to get the page with the strongest increase in likes over a specific time period.
Data to work with
I'm stripping the non relevant information from the create queries.
Table containing all facebook pages
CREATE TABLE `pages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`facebook_id` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`facebook_name` varchar(255) COLLATE utf8_unicode_ci NOT NULL,
`facebook_likes` int(11) NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Example data:
INSERT INTO `facebook_pages` (`id`, `facebook_id`, `facebook_name`, `facebook_likes`)
VALUES
(1, '552825254796051', 'Mesut Özil', 28593755),
(2, '134904013188254', 'Borussia Dortmund', 13213354),
(3, '310111039010406', 'Marco Reus', 12799627);
Table containing all languages
CREATE TABLE `languages` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`language` varchar(5) COLLATE utf8_unicode_ci NOT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Example data
INSERT INTO `languages` (`id`, `language`)
VALUES
(1, 'ID'),
(2, 'TR'),
(3, 'BR');
Table containing the correlation
CREATE TABLE `language_page_likes` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`language_id` int(10) unsigned NOT NULL,
`facebook_page_id` int(10) unsigned NOT NULL,
`likes` int(11) DEFAULT NULL,
PRIMARY KEY (`id`),
// Foreign key stuff
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Example data
INSERT INTO `language_page_likes` (`id`, `language_id`, `facebook_page_id`)
VALUES
(1, 1, 1),
(2, 2, 1),
(3, 3, 1),
(47, 3, 2),
(51, 1, 2),
(53, 2, 2),
(92, 3, 3),
(95, 2, 3),
(97, 1, 3);
Table containing the history
CREATE TABLE `language_page_likes_history` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`language_page_likes_id` int(10) unsigned NOT NULL,
`likes` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`id`),
// Foreign key stuff
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
Example data
INSERT INTO `language_page_likes_history` (`id`, `language_page_likes_id`, `likes`, `created_at`)
VALUES
(1, 1, 3272484, '2015-09-11 08:40:23'),
(132014, 1, 3272827, '2015-09-14 08:31:00'),
(2, 2, 1581361, '2015-09-11 08:40:23'),
(132015, 2, 1580392, '2015-09-14 08:31:00'),
(3, 3, 1467090, '2015-09-11 08:40:23'),
(132016, 3, 1467329, '2015-09-14 08:31:00'),
(47, 47, 828736, '2015-09-11 08:40:23'),
(132060, 47, 828971, '2015-09-14 08:31:00'),
(51, 51, 602747, '2015-09-11 08:40:23'),
(132064, 51, 603071, '2015-09-14 08:31:00'),
(53, 53, 545484, '2015-09-11 08:40:23'),
(132066, 53, 545092, '2015-09-14 08:31:00'),
(92, 92, 916570, '2015-09-11 08:40:24'),
(132105, 92, 917032, '2015-09-14 08:31:01'),
(95, 95, 537382, '2015-09-11 08:40:24'),
(132108, 95, 537395, '2015-09-14 08:31:01'),
(97, 97, 419175, '2015-09-11 08:40:24'),
(132110, 97, 419484, '2015-09-14 08:31:01');
As you can see, I got data for the 14th and 11th September. Now I want to get the site, with the biggest increase in likes. Before I have done it with a column called last_like_count, but the problem is, that I can't be dynamic in the date range. With a "normal" regression function I could be dynamic for every date range.
Solution finding
What I already managed to do, was to build all the relationships that are present
SELECT p.id, p.facebook_name, plh.likes, l.language FROM facebook_pages p
INNER JOIN language_page_likes pl ON pl.facebook_page_id = p.id
INNER JOIN language_page_likes_history plh ON plh.language_page_likes_id = pl.id
INNER JOIN languages l ON l.id = pl.language_id
WHERE pl.language_id = 5 OR pl.language_id = 46 OR pl.language_id = 68
With that query I'm getting every likecount in the history of the system for specific languages. But, how would I build regression analysis into that part?
I already found this link here
Identifying trend with SQL query
but my math and MySQL skills aren't high enough to translate the SQL into MySQL. Any help?