0

I'm new to SAS and need to find a way to do the following:

I have two datasets:

  • Users (user_id, friends) (friends are user_ids seperated by a ",")
  • Reviews (user_id, review_id, business_id, text)

I've merged both on user_id. Now I need to know what percentage of the reviews of the friends of a user is about the same business(es) a user has reviewed.

I guess I need a stored procedure for this (but I'm new to SQL also). Any tips how to start on it?

ADyson
  • 57,178
  • 14
  • 51
  • 63
Pello
  • 93
  • 1
  • 6
  • 1
    *"(friends are user_id's seperated by a ",")"* - for starters, read [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad), where you will see a lot of reasons why the answer to this question is **Absolutely yes!** – Zohar Peled Oct 23 '17 at 12:56

1 Answers1

0

I would start with refactoring your users table to store friends as separate records rather than as a single list value:

data users(drop=friends);
set users;
do i=1 to countw(compress(friends_list),',');
  friend=scan(compress(friends_list),i,',');
  output;
end;
run;

you can then calculate that percentage by joining that table with reviews twice, once per user and once per friend:

proc sql;
create table want as
select t1.user_id
      ,sum(case when t3.business_id=t2.business_id then 1 else 0 end)/count(*) as percentage
from users t1
inner join reviews t2
  on t1.user_id=t2.user_id
inner join reviews t3
  on t1.friend=t3.user_id
group by t1.user_id
;
quit;
user2877959
  • 1,792
  • 1
  • 9
  • 14