2

I'm not a pro in SQL at all :) Having a very critical performance issue. Here is the info directly related to problem.

I have 2 tables in my DB- table condos and table goods.

table condos have the fields:

  • id (PK)
  • name
  • city
  • country

table items:

  • id (PK)
  • name
  • multiple fields not related to issue
  • condo_id (FK)

I have 1000+ entities in condos table and 1000+ in items table.

The problem is how i perform items search

currently it is:

For example, i want to get all the items for city = Sydney

  1. Perform a SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'
  2. Make a SELECT * FROM public.items WHERE item.condo_id = ? for each condo_id i get in step 1.

The issue is that once i get 1000+ entities in condos table, the request is performed 1000+ times for each condo_id belongs to 'Sydney'. And the execution of this request takes more then a 2 minutes which is a critical performance issue.

So, the questions is:

What is the best way for me to perform such search ? should i put a 1000+ id's in single WHERE request? or?

For add info, i use PostgreSQL 9.4 and Spring MVC.

Alexander Petrov
  • 9,204
  • 31
  • 70
user1935987
  • 3,136
  • 9
  • 56
  • 108
  • 4
    Area you aware of table joins? From your question it seems you are not, and if not you need to learn about them. This is exactly the type of problem a join is intended to solve. – Jim Garrison Jul 10 '16 at 06:45

3 Answers3

2

Use a table join to perform a query such that you do not need to perform a additional query. In your case you can join condos and items by condo_id which is something like:

SELECT i.* 
FROM public.items i join public.condos c on i.condo_id = c.condo_id 
WHERE c.city = 'Sydney'  

Note that performance tuning is a board topic. It can varied from environment to environment, depends on how you structure the data in table and how you organize the data in your code.

Here is some other suggestion that may also help:

Try to add index to the field where you use sorting and searching, e.g. city in condos and condo_id in items. There is a good answer to explain how indexing work.

I also recommend you to perform EXPLAIN to devises a query plan for your query whether there is full table search that may cause performance issue.

Hope this can help.

Community
  • 1
  • 1
Wilson
  • 11,339
  • 2
  • 29
  • 33
  • This is not much related to the question, except the join part. – Sami Kuhmonen Jul 10 '16 at 06:58
  • @SamiKuhmonen OP said he has a critical performance issue and execution of a request takes more then a 2 minutes. So I provide suggestion about how to better resolve it. Am I missing something? – Wilson Jul 10 '16 at 07:04
1

Essentially what you need is to eliminate the N+1 query and at the same time ensure that your City field is indexed. You have 3 mechanisms to go. One is already stated in one of the other answers you have received this is the SUBSELECT approach. Beyond this approach you have another two.

You can use what you have stated :

SELECT condos.condo_id FROM public.condos WHERE city = 'Sydney'

SELECT * 
FROM   public.items 
WHERE  items.condo_id IN (up to 1000 ids here) 

the reason why I am stating up to 1000 is because some SQL providers have limitations.

You also can do join as a way to eliminate the N+1 selects

SELECT * 
FROM   public.items join  public.condos on items.condo_id=condos.condo_id  and condos.city='Sydney'

Now what is the difference in between the 3 queries.

Pros of Subselect query is that you get everything at once. The Cons is that if you have too many elements the performance may suffer:

Pros of simple In clause. Effectivly solves the N+1 problem, Cons may lead to some extra queries compared to the Subselect

Joined query pros, you can initialize in one go both Condo and Item. Cons leads to some data duplication on Condo side

If we have a look into a framework like Hibernate, we can find there that in most of the cases as a fetch strategy is used either Joined either IN strategies. Subselect is used rarely.

Also if you have critical performance you may consider reading everything In Memory and serving it from there. Judging from the content of these two tables it should be fairly easy to just upload it into a Map.

Effectively everything that solves your N+1 query problem is a solution in your case if we are talking of just 2 times 1000 queries. All three options are solutions.

Alexander Petrov
  • 9,204
  • 31
  • 70
  • thanks thats the most full answer. one more small question - if i have multiple `WHERE` parameters also, should i specify them in `AND` operator? like: `and condos.city='Sydney' and item.name='item1' and item.price=1999` and so on (if i use JOIN query) – user1935987 Jul 10 '16 at 07:23
  • Theoretically you can put them either straight in the join clause or after the where clause. I know that for the different DB providers there might be some small difference in performance , but you have to test this yourself. So yes either you specify them with AND conditions in the where clause or you place them straight in the JOIN clause again with AND. – Alexander Petrov Jul 10 '16 at 07:27
  • i guess there is a mistake `public.items join public.condos on item.condo_id=condos.condo_id` Need `items.condo_id=condos.condo_id`. But in common seems like work. thanks! – user1935987 Jul 10 '16 at 07:36
  • Yes there is a typo. I will fix it. – Alexander Petrov Jul 10 '16 at 07:37
  • No sory I dont see it. I dont have experience with PostgreSQL so the Postgre syntax is not well known to me. Where is the error ? – Alexander Petrov Jul 10 '16 at 07:39
  • Table name is `items`, so need `items.condo_id` instead of `item.condo_id`. No? – user1935987 Jul 10 '16 at 07:41
  • i used `items.condo_id` – user1935987 Jul 10 '16 at 07:42
0

You could use the first query as a subquery in an in operator in the second query:

SELECT * 
FROM   public.items 
WHERE  item.condo_id IN (SELECT condos.condo_id 
                         FROM   public.condos
                         WHERE  city = 'Sydney')
Mureinik
  • 297,002
  • 52
  • 306
  • 350