I am working with the Yelp data-set available online. I've been trying to optimize my query for days. For the schema I'll list below, I need to construct a query to provide the following:
- Given a user's UID, display the most recent review information for each of the user's friends.
Here's the schema:
CREATE TABLE business(
bid varchar(40) PRIMARY KEY,
name varchar(100),
city varchar(40),
state char(2),
zip varchar(10),
latitude real,
longitude real,
address varchar(100),
numreviews INTEGER DEFAULT 0,
numcheckins INTEGER DEFAULT 0,
avgreview float DEFAULT 0,
isopen bool,
stars float
);
CREATE TABLE users(
uid varchar(40) PRIMARY KEY,
name varchar(40),
avgstars float,
fans INTEGER,
coolvotes INTEGER,
reviewcount INTEGER,
funnyvotes INTEGER,
signup varchar(20),
usefulvotes INTEGER,
latitude real,
longitude real
);
CREATE TABLE reviews(
rid varchar(40) PRIMARY KEY,
bid varchar(40),
uid varchar(40),
stars float,
date varchar(20),
funny INTEGER,
useful INTEGER,
cool INTEGER,
text varchar(1024),
FOREIGN KEY (uid) REFERENCES users(uid),
FOREIGN KEY (bid) REFERENCES business(bid)
);
CREATE TABLE friends(
uid varchar(40) REFERENCES users(uid),
fid varchar(40) REFERENCES users(uid)
);
Here's an example of the desired output:
For each of the user's friends, I display the following:
- The friend's name
- The name of the business from their most recent review
- The city of the business from their most recent review
- The text from their most recent review
Currently this is the only "solution" I've had success with.
Step 1: Get a list of all of the IDs for each of the user's friends.
SELECT fid from friends where uid = '{userId}'
This returns a list of all of the user ID's for each of the user's friends. So I basically have a friend ID list.
Step 2: With this information, I run a foreach
loop in my program over that list. For each iteration of the friend ID list, I execute the below query and provide the temporary friend ID for the current iteration of the loop:
SELECT U.name, B.name, B.city, R.text, R.date FROM reviews as R, users as U, business as B
WHERE U.uid = '{currentFriendId}'
AND R.uid = '{currentFriendId}'
AND B.bid = R.bid
AND date = (SELECT MAX(date) FROM reviews WHERE uid = '{currentFriendId}')
For EACH time I run this for loop, I get a single line of output for what I desire, such as this:
This is great...except I have to run this query for every single one of the user's friends. This is extremely costly.
Goal: I'm trying to combine these 2 queries, or revamp them completely, to generate all of the rows at once in a single query.
Question: Given the information provided, how can I fix my queries to generate all of this information from a single query?