-2

I am looking for a solution using MySQL and/or PHP.

So I have a MySQL database with a video table looking like this...

Record    Keywords
1         Action, Sci-Fi, Future
2         Comedy, Jokes, Sci-Fi
3         Youtube, Sci-Fi, Jokes

My problem is that I need to figure out a way to get the different keywords, count the number of occurrences they have and how many views... So I am trying to get something like this:

Tag        Occurences
Sci-Fi     3
Jokes      2
Action     1
Future     1
Comedy     1
Youtube    1

Now, I have over 20,000 rows in this table with multiple keywords for each of these rows and I need to be able to place a top N on each of our page, also, this list needs to be dynamic and in real-time, so performance is an extremely important aspect.

Chris Trudeau
  • 1,427
  • 3
  • 16
  • 20
Dacramash
  • 111
  • 1
  • 11
  • Where's your attempted code? or Do you have searched the internet for your problem? – aldrin27 Dec 03 '15 at 02:26
  • Considering you have 20.000 rows you made a desing error. You should have a separated table for the keywords. – leonardo_palma Dec 03 '15 at 02:28
  • I do know there is a design error... However we don't always get to pick how the sites were made before having to deal with them :/ I did search on the internet for my problem, tried multiple things I found online that I couldn't get to work like the cross apply method, cast as xml method, parsename... – Dacramash Dec 03 '15 at 02:30
  • It looks like you're going to have to restructure your database and rework your code. I can't see a performance solution coming from that data structure. – Tristan Dec 03 '15 at 02:48
  • To help you 'normalize' the table to easily count the keywords - It will not help the performance issues in any way - maybe interesting? [joining on ';' separated values in a column](http://stackoverflow.com/a/33806675/3184785) – Ryan Vincent Dec 03 '15 at 04:11

2 Answers2

0

Assuming you don't restructure your database:

You're going to need to get all rows, and iterate through them.

// array of keyword counters
$keyword_counts = array();

foreach ($rows as $row)
{
    // explode by ', '
    $keywords = explode($row["Keywords"], ", ");
    foreach ($keywords as $key)
    {
        if (array_key_exists($key, $keyword_counts))
        {  
            // counter for this keyword exists, increment
            $keyword_counts[$key]++;
        }
        else
        {
            // this is the first instance of this keyword, set it to 1
            $keywords_counts[$key] = 1;
        }
    }
}

This will leave you with an unsorted array similar to:

array(
"Jokes" => 34,
"Action" => 22,
"Comedy" => 15)

and to sort it you'd use

arsort($keyword_counts, SORT_NUMERIC);

which preserves the keys, since they are the name of the keyword, and orders numerically descending.

Chris Trudeau
  • 1,427
  • 3
  • 16
  • 20
  • Would this be able to withstand a few hundreds of thousands users per day considering the large volume of data to go through ? Otherwise I think I will suggest to stock this list in a table along with a script set to run once per day to update the table. Thanks a lot though :) – Dacramash Dec 03 '15 at 02:38
  • This whole setup would not scale well at all so if you're expecting that much traffic you'd be forced to implement some changes like what you described – Chris Trudeau Dec 03 '15 at 02:39
  • If I was to suggest a long term, best solution, I guess it would be best to have the videos in a table, the keywords in another table, then a table between both to link them? – Dacramash Dec 03 '15 at 02:42
  • Yes, and also with that method a change to a keyword name would require a change in one spot instead of potentially thousands or millions of rows – Chris Trudeau Dec 03 '15 at 02:44
0

You can use query like this

SELECT COUNT(record) count, tag FROM (
    SELECT b.record, SUBSTRING_INDEX(SUBSTRING_INDEX(b.keywords, ',', a.n), ',', -1) tag
    FROM (
        SELECT (i+1) n 
        FROM (
          SELECT 0 i UNION SELECT 1 UNION SELECT 2 UNION SELECT 3 UNION SELECT 
          4 UNION SELECT 5 UNION SELECT 6 UNION SELECT 7 UNION SELECT 8 UNION SELECT 9
        ) a
    ) a
    INNER JOIN (
        SELECT record, keywords
        FROM video
    ) b ON CHAR_LENGTH(b.keywords)-CHAR_LENGTH(REPLACE(b.keywords, ',', ''))>=a.n-1
) a GROUP BY tag ORDER BY count DESC, tag

see sqlfiddle here

check
  • 559
  • 4
  • 12