3

I have the following tables:

users

userId|name

items

itemId|userId|description

What I want to achieve: I want to read from the database all users and their items (an user can have multiple items). All this data I want it stored in a structure like the following:

User {
id
name
array<Item>
}

where Item is

Item {
itemId
userId
description
}

My first option would be to call a SELECT * from users, partially fill an array with users and after that for each user do a SELECT * from items where userId=wantedId and complete the array of items.

Is this approach correct, or should I use a join for this?

A reason that I don't want to use join is that I have a lot of redundant data:

userId1|name1|ItemId11|description11
userId1|name1|ItemId12|description12
userId1|name1|ItemId13|description13
userId1|name1|ItemId14|description14
userId2|name2|ItemId21|description21
userId2|name2|ItemId22|description22
userId2|name2|ItemId23|description23
userId2|name2|ItemId24|description24

by redundant I mean: userId1,name1 and userId2,name2

Is my reason justified?

LATER EDIT: I added to the title speed or memory when talking about efficiency

INS
  • 10,594
  • 7
  • 58
  • 89
  • 1
    Speedwise? A join...................probably. You should benchmark, if you are worried about speed – Mitch Wheat Mar 07 '12 at 10:13
  • 1
    May be this question and answer can justify your request: http://stackoverflow.com/questions/273238/how-to-use-group-by-to-concatenate-strings-in-sql-server – k06a Mar 07 '12 at 10:19
  • What do you need the data of all users and all items for? Maybe you get better answers if you can specify about which amount we are talking and what the use is of loading 'all' data? – Luc Franken Mar 07 '12 at 10:56
  • @LucFranken some computations require all the data loaded in order to perform some normalization. The example here is purely theoretical, but where I need it, all data must be loaded from the database. – INS Mar 07 '12 at 11:20
  • 1
    Clear. How many records and it isn't possible to do calculations in the database? If that's true then it is just as simple as do a test on both options if you have a real need for speed. Since you state computations it might not have to be a realtime? – Luc Franken Mar 07 '12 at 11:48
  • I'd agree with Luc. If it's computations, it might be that your best bet would be stored procedures and triggers. Keep a running calculation going and simply query for the result - that would be lightning fast. Why bring all those bytes to the middle tier just to do a complex calculation and put the result back? – duffymo Mar 08 '12 at 11:00

3 Answers3

2

You're trading off network roundtrips for bytes on the wire and in RAM. Network latency is usually the bigger problem, since memory is cheap and networks have gotten faster. It gets worse as the size of the first result set grows - Google for "(n+1) query problem".

I'd prefer the JOIN. Don't write it using SELECT *; that's a bad idea in almost every case. You should spell out precisely what columns you want.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • you're correct about `SELECT *` part but I was just trying to keep the question simpler – INS Mar 07 '12 at 10:21
  • But that knocks the legs out from under your "redundant" argument. If you acknowledge that SELECT * is a bad idea, why are you asking? – duffymo Mar 07 '12 at 10:22
1

Join is the best performance way. Reduce overhead and you can use relationated indexes. You can test .. but i'm sure that joins are more fast and optimized than multiple selects

Crsr
  • 624
  • 3
  • 9
  • could you elaborate on the sentence `Reduce overhead and you can use relationated indexes `? – INS Mar 07 '12 at 10:22
  • 1
    The foreign key columns should have indexes. That'll speed up the JOIN. You shouldn't see TABLE SCAN when you ask MySQL to EXPLAIN PLAN on the query. – duffymo Mar 07 '12 at 10:33
  • Yes, if you use multiple selects, mysql need more time and use more temp 'cache' to execute in comparation with a join query. Based on joins and indexes you can avoid orphaned rows and a big overhead. If you hinder mysql server with multiple selects or subqueries you increase overhead especially if you have ..i don't know...+100.000 records per table – Crsr Mar 07 '12 at 10:54
1

The answer is: it depends.

Multiple SELECT:

  • If you end up issuing lots of queries to populate the description, the you have to take into account that you'll end up with a lot of round trips to the database.

Using a JOIN:

  • Yes, you'll be returning more data, but you've only got one round trip.

You've mentioned that you'll partially fill an array with users. Do you know how many users you'll want to fill in advance, because in that case I would use the following (I'm using Oracle here):

select * 
  from item a,
      (select * from 
      (select * 
         from user 
        order by user_id) 
       where rownum < 10) b
 where a.user_id = b.user_id
 order by a.user_id

That would return all the items for the first 10 users only (that way most of the work is done on the database itself, rather than getting all the users back, discarding all but the first ten...)

beny23
  • 34,390
  • 5
  • 82
  • 85