-1

I am given a database with 3 different tables as such, and no sample data: enter image description here

Basically, I'm trying to find a query for the number of users, for whom the last activity was done with device_name = “phone”, but who also have at least one additional activity with another device by combining data from the 3 tables.

The issue is that I can't include both those conditions in the same where clause because I'm they should be on different rows, i.e. different activites.

What I have so far is:

With CTE as (
    select * 
    from users u 
    join activities a on u.user_id=a.activity_user_id 
    join devices d on a.activity_device_ID=d.device_ID
) 
Select count(a.activity_user_id) 
from CTE 
where (a.activity_is_last='Yes' and d.device_name='phone') 

I almost want to include some sort of when clause, so that WHEN the activity is last I want to check if the device is a phone, but when it's not I want to check if it's not phone. I also want to validate that there are at least 2 rows, is_last and not is_last. How is it possible?

Salah
  • 1
  • 1
  • 1
    Removed conflicting product tags. Please, add the one back that your are actually using. – Shadow Jun 23 '21 at 09:09
  • 1
    That's not a "data warehouse" schema. That's just 3 tables with foreign key constraints – Panagiotis Kanavos Jun 23 '21 at 09:11
  • You would probably need an exists subquery or a join back to the same table or a subquery with conditional aggregation, but it is hard to tell due to lack of sample data and not knowing what database product you use. For mysql you could use the answers to the followin question as a starting point: https://stackoverflow.com/questions/4047484/selecting-with-multiple-where-conditions-on-same-column – Shadow Jun 23 '21 at 09:12
  • First of all, try to limit posting photos to only when you can't describe it in textual form. In this case, you can run `SHOW CREATE TABLE table_name` on each of the tables you have and post the results [into your question](https://stackoverflow.com/posts/68096832/edit). Second thing is, without any sample data that correspond to the situation you're having, it's hard to imagine what kind of issue you're having. Last but not least, **visualize** your expected output in your question. – FanoFN Jun 23 '21 at 09:13
  • Also dont use Non-Ascii quotes like you have in this question – RiggsFolly Jun 23 '21 at 09:14
  • Do *any* RDBMS permit those "smart quotes" for delimit identifying? (I hope not...) – Thom A Jun 23 '21 at 09:18
  • @Shadow sorry I'm new to posting on stackoverflow, I'll be more aware of that. – Salah Jun 23 '21 at 09:20
  • @Shadow There is no sample data in fact, and I'll be using MySQL or T-SQL and will specify that in an edit of my post. – Salah Jun 23 '21 at 09:21
  • @FaNo_FN I realise it's not very clear via the info I've given, but this is in fact a case study question for an interview of a colleague of mine and they haven't supplied any sample data. In fact they mention that "any possible data set should be considered and not only a certain data sample, which is why we strongly recommend not to create a sample database for testing. In case of huge databases, specialcases are quickly overlooked when testing queries only on samples. " – Salah Jun 23 '21 at 09:23
  • @RiggsFolly could you please clarify what non-ascii quotes are? I'm new here – Salah Jun 23 '21 at 09:26
  • @Salah people ask for data because they don't understand what you ask. To do what you want you need self-joins for starters, not CTEs. Have you tried that query with any database? You'll find syntax errors due to the use of `"` and `\`` instead of `'` – Panagiotis Kanavos Jun 23 '21 at 09:26
  • @PanagiotisKanavos I haven't actually, thanks for pointing it out. Could you tell me more about the self-join in this case? – Salah Jun 23 '21 at 09:28
  • @Salah tsql is the language of ms sql server, which has different syntax to mysql. You really need to decide which one you are asking the question about. The non-ascii quotes are around phone and Yes. – Shadow Jun 23 '21 at 09:29
  • non-ascii quotes are `“` and `’` use simple `"` and `'` Normally they get created when using a word processor to edit code rather than a simple text editor – RiggsFolly Jun 23 '21 at 09:33
  • Well then, you can just create fake data. Here, [try this fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=a67c60672aee0d0908e4094b251f076c) – FanoFN Jun 23 '21 at 09:34
  • Please share some sample data and desired output. – Kazi Mohammad Ali Nur Romel Jun 23 '21 at 09:50
  • There is a [boolean](https://learn.microsoft.com/en-us/sql/t-sql/language-elements/comparison-operators-transact-sql?view=sql-server-2017#boolean-data-type) data type (with values TRUE, FALSE and UNKNOWN) in SQL Server, but you cannot get a firm grip on one: "Unlike other SQL Server data types, a **Boolean** data type cannot be specified as the data type of a table column or variable, and cannot be returned in a result set." How did you manage to get `activity_is_last` into the `activities` table and why compare it to `'Yes'`? – HABO Jun 23 '21 at 13:04

5 Answers5

0

Here I have used two row_number() window function to rank the output of inner query. RNK1 will represent the rank when a.activity_is_last='Yes' and d.device_name='phone' and RNK2 will represent the rank when a.activity_is_last='No' and d.device_name<>'phone'. Then in outer query I have selected first row from both the ranked and checked whether there are two rows or not with having clause and group by.

With CTE as (
    select * ,
    (case when (a.activity_is_last='Yes' and d.device_name='phone') then 
         row_number()over(partition by case when (a.activity_is_last='Yes' and d.device_name='phone') then activity_user_id end order by activity_timestamp desc) end) rnk1,
    (case when (a.activity_is_last='No' and d.device_name<>'phone') then 
         row_number()over(partition by case when (a.activity_is_last='No' and d.device_name<>'phone')  then activity_user_id end order by activity_timestamp desc) end) rnk2
    from users u 
    join activities a on u.user_id=a.activity_user_id 
    join devices d on a.activity_device_ID=d.device_ID
) 
Select activity_user_id
from CTE 
where rnk1=1 or rnk2=1
group by activity_user_id
having count(*)=2
0

It would be better if you have some sample data. But I think this will work for you:

select * 
     ,case when d.device_name='phone' then 0 else 1 end as HasOtherdevice
into #tmp1
    from users u 
    join activities a on u.user_id=a.activity_user_id 
    join devices d on a.activity_device_ID=d.device_ID


Select a.activity_user_id,sum(HasOtherdevice)as cnt
into #tmp2
from #tmp1
group by a.activity_user_id

select count(a.activity_user_id) 
from #tmp1
where (a.activity_is_last='Yes' and d.device_name='phone') 
     and a.activity_user_id in (select a.activity_user_id
                                from #tmp2
                                where cnt=0)
Nooshin
  • 31
  • 4
0

You can just do two conditional counts in a HAVING clause

select u.user_id
from users u 
join activities a on u.user_id = a.activity_user_id 
join devices d on a.activity_device_ID = d.device_ID
group by u.user_id
having count(case when a.activity_is_last = 'Yes' and d.device_name = 'phone') > 0
   and count(case when a.activity_is_last = 'No' and d.device_name <> 'phone') > 0

Strictly speaking, you should not store activity_is_last in the database anyway, as it is defined just by being the latest row ordered by date. (And in any case it should be a bit column.)

So if you didn't have that column, you would use ROW_NUMBER to get the top row:

select u.user_id
from users u 
join (
    select *,
        rn = row_number() over (partition by a.activity_user_id order by a.activity_timestamp desc)
    from activities a
) a on u.user_id = a.activity_user_id 
join devices d on a.activity_device_ID = d.device_ID
group by u.user_id
having count(case when a.rn = 1 and d.device_name = 'phone') > 0
   and count(case when a.rn > 1 and d.device_name <> 'phone') > 0
Charlieface
  • 52,284
  • 6
  • 19
  • 43
0

Perhaps the simplest way is to count in the subquery.

Basically, I'm trying to find a query for the number of users, for whom the last activity was done with device_name = “phone”, but who also have at least one additional activity with another device by combining data from the 3 tables.

If I assume that activity_is_last is true on only one row, then:

with CTE as (
      select *,
             min(device_name) over (partition by user_id) as min_device_name,
             max(device_name) over (partition by user_id) as max_device_name
      from users u join
           activities a
           on u.user_id = a.activity_user_id join
           devices d
           on a.activity_device_ID = d.device_ID
     ) 
Select count(*) 
from cte
where activity_is_last = 'Yes' and device_name = 'phone' and
      min_device_name <> max_device_name ;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

The simplest way seems to be to locate the base row(s) using WHERE activity_is_last='Yes' and d.device_name='phone'. If activity_is_last='Yes' occurs once per user then the query returns unique user_id's. Then use EXISTS to only count users with activity where at least 1 device_name is other than 'phone'. Something like this

select count(a.activity_user_id) as num_of_users
from activities a
     join devices d on a.activity_device_ID=d.device_ID
where a.activity_is_last='Yes'
      and d.device_name='phone'
      and exists (select 1
                  from activities aa
                       join devices dd on aa.activity_device_ID=dd.device_ID
                  where a.activity_user_id=aa.activity_user_id
                        and dd.device_name<>'phone');
SteveC
  • 5,955
  • 2
  • 11
  • 24