0

I am trying to determine the players who played 'Chess' on Jan 7th 2021, what percentage played again the next day (Jan 8th)?

game_table
| column       | data_type|
|:----         |    -----:|
| user_id      |    BIGINT|
| date         |    STRING|
| game_name    |    STRING|
| session_count|    BIGINT|

Here is my code but I don't think it is correct:

with t1 as
(select game_name, count(*) as count_jan7 
from instant_game_sessions 
where date = '2021-01-07' and game_name = 'Chess'
group by 1).

t2 as
(select games, count(*) as count_jan7_and_jan8
from instant_game_sessions 
where date = '2021-01-07' and game_name = 'Chess' and date= '2021-01-08'
group by 1)

select cast(count_jan7_and_jan8 as numeric)/count_jan7 *100
from t1 
join t2 on t1.game_name = t2.game_name 
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Kay
  • 335
  • 3
  • 14
  • Is `(user_id, game_name, date)` defined unique? Why is "date" a string, and not type `date`? *Is* it a date? (Multiple distinct strings can result in the same `date`!) Also `what percentage played again the next day` - the same game, or anything? – Erwin Brandstetter Apr 05 '21 at 23:38

2 Answers2

1

Assuming the table definition has this solid core:

CREATE TABLE game_table (
  user_id   bigint NOT NULL
, date      date   NOT NULL  -- date, not text!
, game_name text   NOT NULL
, UNIQUE (date, game_name, user_id)  -- !
);

And assuming you meant the same player playing the same game next day:

SELECT round(ct_day2 * 100.0 / ct_day1, 2) AS repeat_percentage
FROM  (
   SELECT count(*) AS ct_day1
        , count(d2.user_id) AS ct_day2
   FROM   instant_game_sessions d1
   LEFT   JOIN instant_game_sessions d2 ON (d2.user_id, d2.game_name, d2.date)
                                         = (d1.user_id, d1.game_name, d1.date + 1)
   WHERE  d1.date = '2021-01-07'
   AND    d1.game_name = 'Chess'
   ) sub;

The UNIQUE constraint makes sure there can only be a single match on the next day. So count(*) is the correct count for day 1, and count(d2.user_id) for day 2. The rest is obvious.

The UNIQUE constraint (with column names in this order!) also provides the perfect index for the query. See:

Note that the numeric constant 100.0 defaults to numeric automatically, so we need not add any explicit type cast. Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks, but I think there's an issue. The formula should be (count of users who played on Jan7+the count of the same users from Jan7 who also played on Jan8) and divide it by (count of people who played on 7). I think your math formula is just (count of users who played on Jan8) divide by (count of users who played on Jan7). Do you happen to know how to resolve it? Once again, thanks in advance. – Kay Apr 16 '21 at 23:11
  • @Kay: Your question says: *determine the players who played 'Chess' on Jan 7th 2021, what percentage played again the next day (Jan 8th)?* So if 10 played on day1 and 3 of those played again on day2 (ignoring others), that's 30 %. (Can never be > 100 %) That's what my answer does, based on the mentioned assumptions you did not bother to clarify. Your description does something else entirely. And your description of my math does not apply. – Erwin Brandstetter Apr 16 '21 at 23:26
0
SELECT
  sum(case when played_next_day then 1 else 0) / count(*) as next_day_played_ratio
FROM (
  select
    user_id,
    exists(select 1 from instant_game_sessions t2
           where t1.user_id=t2.user_id
             and t1.game_name=t2.game_name
             and t2.date=(t1.date + interval '1 days'))
     AS played_next_day
  from (
    select user_id, game_name, date
    from instant_game_sessions
    where game_name='Chess'
      and date='2021-01-07'
    GROUP BY user_id, game_name, date
  ) AS t1
)
AdamKG
  • 13,678
  • 3
  • 38
  • 46