2

I have the following tables:

"crawlresults"
id  |   url                 | fk_crawljobs_id
---------------------------------------------
1   |   shop*com/notebooks  |   1
2   |   shop*com/fridges    |   1
3   |   website*com/lists   |   2


"extractions"
id  | fk_extractors_id  | data          |   fk_crawlresults_id
---------------------------------------------------------------
1   |   1               | 123.45        |   1
2   |   2               | notebook      |   1
3   |   3               | ibm.jpg       |   1
4   |   1               | 44.5          |   2
5   |   2               | fridge        |   2
6   |   3               | picture.jpg   |   3
7   |   4               | hello         |   3
8   |   4               | world         |   3
9   |   5               | hi            |   3
10  |   5               | my            |   3
11  |   5               | friend        |   3


"extractors"
id  |   extractorname
----------------------
1   |   price
2   |   article
3   |   imageurl
4   |   list_1
5   |   list_2

I need to construct a select statement to get columns for each extractor in the extractors table that is used in the extractions table.

Example:

url                 | price     | article   | imageurl
--------------------------------------------------------
shop*com/notebooks  | 123.45    | notebook  | ibm.jpg
shop*com/fridges    | 44.5      | fridge    | NULL

I don't how much extractornames exists when I execute the select statement so it have to be dynamically built.

EDIT: I forgot to mention that it is possible that I have multiple "lists" in my extractions. In this case I need a the following result set.

Example 2:

url                 | list_1    | imageurl      | list_2
--------------------------------------------------------
website*com/lists   | hello     | picture.jpg   | NULL
website*com/lists   | world     | picture.jpg   | NULL
website*com/lists   | NULL      | picture.jpg   | hello
website*com/lists   | NULL      | picture.jpg   | my
website*com/lists   | NULL      | picture.jpg   | friend

Thank you!

jimbo
  • 582
  • 1
  • 11
  • 28
  • Another similar question http://stackoverflow.com/q/7674786/684229 – Tomas Jan 20 '14 at 20:55
  • There is no pivot query in MySQL. In fact these types of queries do not belong to the database because their result is not table per se; pivot transformation belongs to the application layer *on top* of the database. – Tomas Jan 20 '14 at 20:57
  • @Tomas, check my answer, all you need is a prepared statement and some love. Don't need to bring the application layer on this :). – Anthony Accioly Jan 20 '14 at 21:04
  • @AnthonyAccioly doesn't look like nice general way to perform pivot queries. I think more viable approach is to find some pivot transformation library in Java/PHP/ASP/C#/whatever (similar to `cast` function in R - very simple usage) than debug some complex SQL over and over again :-) – Tomas Jan 20 '14 at 21:11
  • @AnthonyAccioly I think application layer is exactly where this belongs. – Tomas Jan 20 '14 at 21:14
  • @AnthonyAccioly and didn't take the note with love :-) – Tomas Jan 20 '14 at 21:14
  • @Tomas, Ada *Love*lace? (Wink, wink). Never mind, I'm bored today. – Anthony Accioly Jan 20 '14 at 23:52

1 Answers1

3

You are looking for Dynamic pivot tables.

Code:

SET @sql = NULL;
SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
      'MAX(IF(pa.extractorname = ''',
      extractorname,
      ''', p.data, NULL)) AS ',
      extractorname
    )
  ) INTO @sql
FROM extractors;

SET @sql = CONCAT('SELECT c.url, ', 
  @sql, 
  ' FROM crawlresults c', 
  ' INNER JOIN extractions p on (c.id = p.fk_crawlresults_id)', 
  ' INNER JOIN extractors pa on (p.fk_extractors_id = pa.id)'
  ' WHERE c.fk_crawljobs_id = 1',
  ' GROUP BY c.id');

PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;

Working fiddle


Basically your original query was generating a bogus @sql variable which didn't really extract data for each extractorname. You also don't need all of those joins for creating @sql. You only need each one of the property names (from extractor table) and a reference to the column holding the expect values (data).

When in doubt about the structure, write out a simple pivot query for a fixed set of properties. This way it becomes easy to identify the pattern for writing the dynamic query.

SELECT c.url, 
  MAX(IF(pa.extractorname = 'price', p.data, NULL)) AS price,
  MAX(IF(pa.extractorname = 'article', p.data, NULL)) AS article,
  MAX(IF(pa.extractorname = 'imageurl', p.data, NULL)) AS imageurl 
FROM crawlresults c 
  LEFT JOIN extractions p on (c.id = p.fk_crawlresults_id) 
  LEFT JOIN extractors pa on (p.fk_extractors_id = pa.id) 
WHERE c.fk_crawljobs_id = 1
GROUP BY c.id

As for the rest of your query it is fine, just keep in mind that the LEFT JOINS could be useful if there are no extractions for some crawlresults. Also if your table can contain more than one crawlresult per url / fk_crawljobs_id, grouping by url is a bad idea (MAX can potentially mix the results from multiple extractions).

Anthony Accioly
  • 21,918
  • 9
  • 70
  • 118
  • When I have an additional extractorname in the extractors table that is not linked in the extractions table it still becomes a column in the result set. Another concern is when I have one or more lists of extractions. I have edited my question accordingly. – jimbo Jan 20 '14 at 23:54
  • 1
    Jimbo, about the link it is easy, it is a mater of filtering only `extractorname` relevant to that query ([working fiddle](http://sqlfiddle.com/#!2/5a8fb1/2)), just update the clause that creates `@sql` accordingly. – Anthony Accioly Jan 21 '14 at 00:52
  • 1
    [Masterpiece fiddle](http://sqlfiddle.com/#!2/03a56/11) now repeats single-valued results over multi-valued rows. – Anthony Accioly Jan 21 '14 at 06:13
  • The last fiddle is exactly what I needed. Thank you so much for this! :-) – jimbo Jan 21 '14 at 15:47