0

I have a really complicated SQL question for a mySQL database. I will first introduce all needed tables for this question:

User Table:

Id  date_created
------------------------------
1   2015-09-19T14:18:07.000Z 
2   2015-09-20T01:16:34.000Z
3   2015-09-21T15:10:21.000Z
…

Setting table:

Id  User_id     setting_key     setting_value
----------------------------------------------
1   1           city            1
2   3           city            2
3   2           city            1
…

City names Table:

Id  name
------------------
1   New York    
2   Los Angeles
3   Boston
…

With a select-query I Would like to achieve this:

date            New York    Los Angeles     Boston  …
------------------------------------------------------
2015-09-19      1           0               0   
2015-09-20      2           0               0
2015-09-21      2           1               0
…

For every date in certain range how many users has as setting New york, Los, Angeles.... The only thing I can achieve is this query:

select date(u.date_created), n.name, count(u.id)
from user u inner join setting s
    on u.id = s.user_id
    and setting_key = 'city'
    inner join name n
    on s.setting_value = n.id
where u.date_created > '2015-09-19T14:18:07.000Z'
group by 1, 2 

but then I get this result:

date        name        count
-------------------------------
2015-09-19  New York    1
2015-09-19  Los Angeles 0
2015-09-19  Boston      0
2015-09-20  New York    1
2015-09-20  Los Angeles 0
2015-09-20  Boston      0
2015-09-21  New York    0
2015-09-21  Los Angeles 1
2015-09-21  Boston      0
…

It is not cummulative en there is row for each city instead of a column for each city. Does somebody know (the complicated) answer? If something is not clear just ask, In real live these are tables of 50.000 rows and I can't change the structure, I need this query for analytics purposes

2 Answers2

1

If you know the names of the cities, just use conditional aggregation:

select date(u.date_created), 
       sum(name = 'New York') as NewYork,
       sum(name = 'Los Angeles' as LosAngeles,
       sum(name = 'Boston') as Boston
from user u inner join
     setting s
     on u.id = s.user_id and
        setting_key = 'institution' inner join
     education_niveau en
     on s.setting_value = en.id
where u.date_created > '2015-09-19T14:18:07.000Z'
group by 1;

If you don't know the names of the cities or there are an unknown number, then you will need dynamic SQL -- that is, construct the SQL, prepare a statement, and then execute it.

EDIT:

It wasn't obvious at first that the question is about cumulative counts. For that, use variables:

select dte,
       (@ny := @ny + NewYork) as NewYork,
       (@la := @la + LosAngeles) as LosAngeles,
       (@b := @b + Boston) as NewYork
from (select date(u.date_created) as dte, 
             sum(name = 'New York') as NewYork,
             sum(name = 'Los Angeles' as LosAngeles,
             sum(name = 'Boston') as Boston
      from user u inner join
           setting s
           on u.id = s.user_id and
              setting_key = 'institution' inner join
           education_niveau en
           on s.setting_value = en.id
      where u.date_created > '2015-09-19T14:18:07.000Z'
      group by 1
      order by 1
     ) us cross join
     (select @ny := 0, @la := 0, @b := 0) params
order by 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • When I run your solution, I it's not cummulative so on the row for september 21 I want all users until then that have New York as city in their settings, If you could give an example for a dynamic query It would be awesome, I never worked with prepared statements – Pieter Van der Haegen Dec 27 '15 at 14:38
0

You want a pivot, which is not supported natively by mysql. However, this may work for you:

select date, group_concat(concat(name, '=', uid_count)) cities
from (
    select date(u.date_created) date, en.name, count(u.id) uid_count
    from user u
    join setting s on u.id = s.user_id
    join education_niveau en on s.setting_value = en.id
    where u.date_created > '2015-09-19T14:18:07.000Z'
    and setting_key = 'institution'
    group by 1, 2) x
group by 1

which will produce a result like:

date            Cities
------------------------------------------------------
2015-09-19      New York=1   
2015-09-20      New York=2
2015-09-21      New York=2,Los Angeles=1

The main advantage with this approach is the range of cities returned is totally based on the data.

Bohemian
  • 412,405
  • 93
  • 575
  • 722