-6

I wonder why join statement is very popular, because many fields may be duplicated in result rows.

Supplier(id, name, address)
Product(id, name, detail)
Product_Supplier(id, productId, supplierId, quantity)

SELECT Product.*,
       Product_Supplier.supplierId,
       Product_Supplier.quantity
FROM Product
INNER JOIN Product_Supplier ON Product.id = Product_Supplier.productId

The result may look like this:

Product.id  Product.name  Product.detail  Product_Supplier.supplierId  Product_Supplier.quantity
1           'Product 1'   'bla bla'       100                          20
1           'Product 1'   'bla bla'       101                          30
1           'Product 1'   'bla bla'       102                          20
1           'Product 2'   'bla bla'       100                          120

So if product.detail is large text then performance will be slow down, right? If it's true, why join statement is very popular?

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
user3621203
  • 45
  • 1
  • 5

2 Answers2

3

JOIN is not popular, it is needed to compose results with data from more than one table.

Lets say you have a table with products and other with sells and you want to develop a program which generates a reports with all the sells including details on the product being sell. You need to compose each row, how? You could do it in your program with to queries and intermediate data structures or you could just ask you database to do it for you. Hence JOIN two tables.

In you case, if product.detail is too large and you don't want it in your result you should just avoid include it in the selected columns. Something like:

SELECT Product.name,
   Product_Supplier.supplierId,
   Product_Supplier.quantity
FROM Product
INNER JOIN Product_Supplier ON Product.id = Product_Supplier.productId

Instead SELECT Product.* ...

  • Thanks for your answer. But I have to achieve 2 goals: 1. get all information of products; 2. get other information like: quantity... So if I use join statement, I get in troubles: if include all columns of product table, many columns are duplicated, not good if they are large or if I don't include them, I can't achieve the first goal. Is there any better way? – user3621203 Jun 16 '14 at 07:51
1

Joining is the only way to get data from different sources (tables, functions, views, etc) in one statement.

Considering your sample, how would you get all information from a product supplier and a product without a join?

Of course, you could have multiple tables in your from, and the difference isn't that big often. Joins are very useful when the relationship becomes complex.

If you have trouble with collision of names, add aliases:

SELECT Product.id product_id
,      ...
FROM   Product
INNER JOIN Product_Supplier ON Product.id = Product_Supplier.productId

Considering your performance tag: joins aren't always performing better then multiple tables in the where. Oracle for example has some problems with view in view relations and ANSI joins.

Patrick Hofman
  • 153,850
  • 22
  • 249
  • 325
  • Thank for your answer. I want to get information of all products, then other their information like quantity, supplierId... I use join to do all jobs with only one sql statement. So is it a good approach? Because if I choose this way, many column fields are duplicated (like 'bla bla'...)? – user3621203 Jun 16 '14 at 07:37
  • You can't prevent that. That is just the way it works, and with a bit of luck, the transport layer of your database filters out those duplicates when copying it over the line. – Patrick Hofman Jun 16 '14 at 07:38
  • In my problem that I described above, should I use only one join statement or use multiple select: first get all products, then select from `Product_Supplier` with product id? Which is better and more convenient? – user3621203 Jun 16 '14 at 07:41
  • You have to consider both transport time and roundtrip time. If you have to do two statements, the database has to do more work, and you have to go back and forth twice. Also you have to 'join' the relevant information client side then, so it won't really help you. – Patrick Hofman Jun 16 '14 at 07:44
  • I'm still thinking about 2 approaches. Use join is very convenient, one statement, easy to maintain but in application code, like Java/C++, the result set will be large (because of duplicated columns), so memory usage maybe a problem. Use multiple select reduces the memory usage, but as you said, I have to go back and forth twice. I'm looking for the best way to deal with problem. If you propose any idea, I'll appreciate. Thank you! – user3621203 Jun 16 '14 at 08:46
  • I wouldn't care too much about the memory if the text isn't very long (as in total a few kb - mb). – Patrick Hofman Jun 16 '14 at 12:45