0

Hi I'm looking to see if there is alternative syntax for the following self-reflexive cross-join. The objective is a sort of row-filler for a table - dates should have entries for every cdn. I am using MySQL

select 
  d.labelDate, 
  n.cdn,
  networks.sites
from (
  select 
    distinct labelDate 
  from 
    cdn_trend
) as d
cross join (
  select 
    distinct cdn 
  from cdn_trend
) as n
left join cdn_trend as networks 
  on  networks.labelDate = d.labelDate
  and networks.cdn = n.cdn    
order by 
  labelDate, 
  cdn

I've tried recasting the cross-join using simple aliases but that gives me column errors in the join. Is it possible to do so or should I consider using views instead?

As a cross join should simply return the Cartesian product of two tables it should be the same as simply selecting both without a join. However, the following raises an "unknown column d.labelDate in on clause" exception

select distinct d.labelDate, n.cdn, networks.sites
from 
cdn_trend as d,
cdn_trend as n
left join cdn_trend as networks ON
(n.labelDate = networks.labelDate
and d.cdn = networks.cdn)
order by labelDate, cdn

Error Code: 1054. Unknown column 'd.cdn' in 'on clause'

Because the length of dand n are relatively small the size of the query is fast enough.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55
  • this is highly inefficient ... what is the purpose of this ? to eliminate the entries where the date is not set ? – Stephan May 15 '13 at 09:43
  • I do not see what your LEFT OUTER JOIN is doing. You do not use anything from the alias "networks" in your query other than where you are joining them. I would also try not to do sub-selects, but maybe you need to. It is hard to say without more context. – Mike Harris May 15 '13 at 12:33
  • @Stephan the purpose of the query is to return an equal number of rows for each time period. For example: 2013-05 might have `('a', 'c')` and 2013-04 might have `('b', 'c')`. I would like to have rows with `2013-04, 'a', NULL` `2013-04, 'b', 5` `2013-04, 'c', 4` `2013-05, 'a', 6` `2013-05, 'b', NULL` `2013-06, 'c', 7` – Charlie Clark May 15 '13 at 13:33
  • @MikeHarris yes, sorry I missed out `networks.sites` – Charlie Clark May 15 '13 at 13:38
  • @MarkBannister thanks for spotting the typo - too busy fighting with the editor to check everything. I've corrected it and added the error message. This seems to indicate that the left join against two tables is not possible but if I write it as two separate left joins I don't get the results I need and the query is significantly slower. – Charlie Clark May 15 '13 at 16:55

3 Answers3

0

I think you were close in your original intent... For every date, you want the results of every network node status. If you list multiple tables in the WHERE clause without a join condition, it by default will create a Cartesian... From that, join to your detail table...

select 
      d.labelDate, 
      n.cdn,
      networks.sites
   from 
      ( select d.LabelDate, n.cdn
           from 
              ( select distinct labelDate 
                   from cdn_trend ) as d,
              ( select distinct cdn 
                   from cdn_trend ) as n ) as CrossResults
         LEFT JOIN cdn_trend as networks 
         on  CrossResults.labelDate = networks.labelDate
         and CrossResults.cdn = networks.cdn
   order by 
      networks.labelDate, 
      networks.cdn
DRapp
  • 47,638
  • 12
  • 72
  • 142
  • Thanks but that still raises the same error. However, I think the solution is actually to generate a pivot table as that is actually what I want: a `labelDate` by `cdn` matrix with `sites`or `Null`as values. I thought going via an orderly result set such as the one my query generates would be necessary but that doesn't seem to be the case. Now I just need to write the function that will create the columns. – Charlie Clark May 15 '13 at 18:15
0

Reading the comments and the extra info you need a pivot with y - lableDate and x - cdn and values - sites assuming the values for cdn are (a,b,c) and that sites is a number you can try this:

SELECT
    labelDate,
    SUM(IF(cdn = 'a',sites,0)) as cdn_a,
    SUM(IF(cdn = 'b',sites,0)) as cdn_b,
    SUM(IF(cdn = 'c',sites,0)) as cdn_c
FROM 
    cdn_trend
GROUP BY
    labelDate

And the output should be something like this (i used the sample data from you) :

labelDate   cdn_a   cdn_b   cdn_c
2013-04     NULL    5       4
2013-05     6       NULL    NULL
....
Stephan
  • 8,000
  • 3
  • 36
  • 42
  • Yes, that's what I'm working towards. Is there any way to create a function/procedure to generate those columns in the query head? I've been looking at [a similar problem](http://stackoverflow.com/questions/12004603/mysql-pivot-row-into-dynamic-number-of-columns) but have not managed to create something that will either generate and run the query or simply return the columns? – Charlie Clark May 16 '13 at 08:41
  • You can do it from the code side ... using a script that will generate this query . Basically you need the distinct values of `cdn` field to generate the pivot . – Stephan May 16 '13 at 09:04
  • Sure, I know how to generate the query in client code. Was just thinking about having a more generic approach. I guess I can do the same with a user-defined-function? – Charlie Clark May 16 '13 at 09:39
  • Yes you can have a more generic approach like creating a method that has the inputs : field_x (labelDate in ur case), field_y (cdn in ur case),field_value (sites in ur case) , table_name (cdn_trend) . I'm not sure if its possible using a UDF but you can try. – Stephan May 16 '13 at 09:45
  • Seems to work fine as a simple function but not as a procedure. But I suspect this has a lot to do with not quite understanding how all this works. I have now successfully negotiated all the `OUT`s and `IN`s of procedures including the limit on the length of `GROUP_CONCAT` – Charlie Clark May 16 '13 at 20:27
  • I'll post it as soon as I've cleaned it up. FWIW performance is similar to my original `CROSS JOIN`which is to be expected as it is proper Cartesian product. Pretty MySQL doesn't provide this out of the box or even document how you do it. – Charlie Clark May 17 '13 at 19:22
0

After some playing around this is the best I could come up with. It seems that parametrising the table name would be possible but would involve yet another layer of statement generation that I fortunately don't need for this project.

-- --------------------------------------------------------------------------------
-- Routine DDL
-- Note: comments before and after the routine body will not be stored by the server
-- --------------------------------------------------------------------------------
DELIMITER $$

CREATE DEFINER="root"@"localhost" PROCEDURE "cdn_pivot"(
    IN slice varchar(64),
    IN start date,
    IN stop date)

BEGIN

SET @@group_concat_max_len = 32000;
SET @sql = NULL;

SELECT
  GROUP_CONCAT(DISTINCT
    CONCAT(
    ' sum(IF(cdn = ''',
        cdn,
        ''', sites,NULL)) "'
        ,cdn, '"'
    )
  ) INTO @sql
FROM cdns ORDER BY sites;

SET @stmt = CONCAT('SELECT labelDate, ',
@sql,
' from cdns 
WHERE slice = ''',
slice,
''' AND ( labelDate between''',
start,
''' AND ''',
stop,
'''
) 
GROUP BY labelDate');

prepare stmt from @stmt;
execute stmt;
deallocate prepare stmt;

SET @@group_concat_max_len = 1024;
END

This can then simply be called e.,g. call cdn_pivot('Top100', '2013-01-01', 2013-02-01')

Given the problems associated with testing this code and keeping it with any client-side code it's very tempting to generate the dynamic part of the head on the client and, at least for this kind of use case, the performance penalty of the additional query shouldn't be too high. The key thing is obviously understanding how to generate the columns dynamically.

Charlie Clark
  • 18,477
  • 4
  • 49
  • 55