0

I've been tasked with providing a CSV with a lot of raw data for other people to analyze using Excel. It's a users table and a purchases link table (users and products) and the requirement is this:

A row for each user with the user's id and some other basic info. Every subsequent column on that row is the product_id that they've purchased in the month of March. Only purchases in March, and only the top 1000 users with the most purchases in March.

Something like this:

select purchases.user_id, purchases.product_id
  from users
  left join purchases on purchases.user_id = users.id
  where (purchases.created_at < '2016-03-31 23:59:59.999999' 
  and purchases.created_at > '2016-03-01 00:00:00.000000')
  -- an order by statement here for users with the most purchases
  limit 1000
;

but instead of the result being

user_id | product_id
1000    | 3391
1000    | 8482
1000    | 4386
1008    | 4382
...

have it like this:

user_id | product_id | product_id | product_id
1000    | 3391       | 8482       | 4386
1008    | 4382       |            |
...

I'm still quite an SQL beginner, and not really sure where to go from here. coalesce seems wrong. Thanks!

Mason
  • 111
  • 6
  • This is called pivot table or cross tabulated (crosstab) query. The linked topic contains mysql implementation for both static (columns are known in advance) and dynamic (columns are determined on the fly) pivot. – Shadow Apr 12 '16 at 08:42
  • @Shadow Thanks for the link! I'll give it a try. I didn't know the `pivot` keyword so my searches were coming up with something different. – Mason Apr 12 '16 at 08:45
  • Perhaps OT, but when LEFT JOIN, put the right side table's conditions in the ON clause to get true left join behavior. (When in WHERE, you get regular inner join result.) – jarlh Apr 12 '16 at 08:51

0 Answers0