2

I apologize in advance as I may not be describing my problem properly. I am trying to write a query that takes the top 5 most popular chosen_user_items per user and concatenating the top rows per user into a comma separated string via group_concat and then grouping by user_id's.

For example, if user_id of 1 has five rows for item_id of 1, two rows for item_id of 2, three rows for item_id of 3, and a single row for 4 5 and 6, then the result would be 1, 3, 2, 4, 5.

Here is my example table structure.

Name: chosen_user_items

id | user_id | item_id
------------------------
1  | 1       | 1
2  | 1       | 4
3  | 1       | 19
4  | 1       | 10
5  | 1       | 13
.  | 1       | 1
.  | 1       | 11
.  | 1       | 18
.  | 1       | 212
.  | 1       | 654
.  | 2       | 1
.  | 2       | 28
.  | 2       | 568
.  | 2       | 112
.  | 2       | 354
.  | 3       | 4
.  | 3       | 4
.  | 3       | 19
.  | 3       | 212
.  | 3       | 654
.  | 3       | 4
.  | 3       | 4
.  | 3       | 253
.  | 3       | 187
.  | 3       | 212

And here is an example of my desired output:

user_id | group_concat_results
------------------------------
1       | 1, 4, 19, 13, 212
2       | 1, 28, 568, 212, 354
3       | 4, 212, 19, 654, 253

Here is the query that I have so far

SELECT `chosen_user_items`.`item_id`, COUNT(`chosen_user_items`.`item_id`) AS 'item_count' 
FROM `chosen_user_items` 
WHERE `chosen_user_items`.`user_id` = 1
GROUP BY `chosen_user_items`.`item_id` 
ORDER BY `item_count` DESC 
LIMIT 5

While this works great for a single user, I want to be able to run this query just once for all users (in order to avoid doing hundreds or thousands of database queries), and having to manually concat the results in a language such as PHP.

Thanks in advance.

klye_g
  • 1,242
  • 6
  • 31
  • 54

3 Answers3

2

With some sort of ranking it is possible to do it with 1 query.

select user_id, group_concat(item_id) from 
(
select 
  user_id
  ,item_id
  ,@item_rank := if(@current_item = user_id, @item_rank+1,1) as item_rank 
  ,@current_item:=user_id
  from
      (      
      select 
        user_id
        ,item_id
        ,count(*) aantal
      from chosen_user_items
      group by user_id,item_id
      order by user_id,count(*) desc
      ) a )b
      where item_rank <6
      group by user_id

Here's an sqlfiddle to test it.

I only have a problem with the order of the group_concat, it isn't ordered accordingly.

try the following query to see the result before the group_concat maybe you can concat it better.

select 
  user_id
  ,item_id
  ,@item_rank := if(@current_item = user_id, @item_rank+1,1) as item_rank 
  ,@current_item:=user_id
  from
      (      
      select 
        user_id
        ,item_id
        ,count(*) aantal
      from chosen_user_items
      group by user_id,item_id
      order by user_id,count(*) desc
      ) a 

this query sorts the concat correctly according to the data in your question:

select user_id, group_concat(item_id) from 
(
select 
  user_id
  ,item_id
  ,@item_rank := if(@current_item = user_id, @item_rank+1,1) as item_rank 
  ,@current_item:=user_id
  from
      (      
      select 
        user_id
        ,item_id
        ,count(*) aantal
      from chosen_user_items
      group by user_id,item_id
      order by user_id,count(*) desc
      ) a )b
      where item_rank <6
      group by user_id
      order by user_id,item_rank asc
davejal
  • 6,009
  • 10
  • 39
  • 82
1

A solution using R.

The dbplyr package will allow you to run this script directly against the database without pulling the data into memory. If you don't want to use R, you can render the SQL query that dbplyr generates from your R statement.

library(tidyverse)
library(stringr)

# --- Set Up ---
dat <- tribble(
  ~user_id, ~item_id,
  1, 1,
  1, 4,
  1, 19,
  1, 10,
  1, 13,
  1, 1,
  1, 11,
  1, 18,
  1, 212,
  1, 654,
  2, 1,
  2, 28,
  2, 568,
  2, 112,
  2, 354,
  3, 4,
  3, 4,
  3, 19,
  3, 212,
  3, 654,
  3, 4,
  3, 4,
  3, 253,
  3, 187,
  3, 212
)

# --- Prep --- 
pre <- dat %>% 
  group_by(user_id) %>% 
  arrange(user_id, item_id) %>% 
  add_count(item_id) %>% 
  rename(
    n_items = n
  ) %>% 
  distinct(user_id, item_id, .keep_all = TRUE) %>% 
  top_n(5, n_items) %>% 
  slice(1:5) %>% 
  arrange(user_id, desc(n_items)) 

# --- Solve ---
# Hacky
solution_one <- pre %>% 
  mutate(collapsed = str_c(item_id, collapse = ", ")) %>% 
  slice(1) %>% 
  select(user_id, collapsed)

# Ideal
solution_two <- pre %>%
  nest() %>% 
  mutate(
    collapsed = data %>% 
      map("item_id") %>% 
      map_chr(str_c, collapse = ", "))

Output:

solution_two
#> # A tibble: 3 x 3
#>   user_id             data            collapsed
#>     <dbl>           <list>                <chr>
#> 1       1 <tibble [5 x 2]>     1, 4, 10, 11, 13
#> 2       2 <tibble [5 x 2]> 1, 28, 112, 354, 568
#> 3       3 <tibble [5 x 2]> 4, 212, 19, 187, 253

This is the best solution because you retain the item_id and its count in the nested list-column data.

amanda
  • 321
  • 5
  • 12
-1

To solve this issue, I think you need to go for 4 different steps.

First, you need to choose/select/order the rows that will be shown. This can be done using a row_number and partition by (this can't be used in MYSQL, but in this guide, they show you the MYSQL equivalente solution https://blog.sqlauthority.com/2014/03/09/mysql-reset-row-number-for-each-group-partition-by-row-number/)

Secondly, you will need to filter the rows that have a row_number lower than 5, this will behave like the "limit 5" of each of the queries.

Third step, you need to have this 5 records for each user transfor to 5 columns. This can be done using pivot table. Here you can find an example of what you have to do: MySQL pivot table

Final step: all you need to do is concat each of the 5 columns, and you will have the information you need for each user.

I hope this clarifies

Edit: Using function GROUP_CONCAT will allow you to replace the 2 last steps

  • 1
    Thanks for the link. I'll take a look at it. Do you know off the top of your head if the GROUP_CONCAT can be achieved with that method? – klye_g Feb 14 '18 at 17:23
  • This method lets you choose, for each user_id the elements to concat and the order that they will be shown, but it won't concat them, just number them the way you wish. Maybe I got the question wrong, but wasn't that the objective? – João Gonçalves Feb 14 '18 at 17:26
  • For a concat, I think you need to acomplish 2 steps. First, make the rows behave like columns, this can be achieved by a pivot. Second, you need to concat the resulting columns. – João Gonçalves Feb 14 '18 at 17:30
  • That's true, I can also do a temporary lookup table – klye_g Feb 14 '18 at 17:31
  • editted answer to make it clearer on everything you expect. – João Gonçalves Feb 14 '18 at 17:40
  • Thanks for updating the answer, I'll take a look and get back to you – klye_g Feb 14 '18 at 18:16