0

I have a legacy DB which uses a table similar to this structure:

orders
    id, buyer_id, product

the table saves the product as text instead of referencing to the products table (which exists). The person who setup the table explained his reasoning:

The products are quite unique. i.e. There are only 2 or 3 buyers for each different product. The product table is therefore almost as big as the orders table. He wants to avoid a join to speed up the queries on the orders table.

Does this actually improve performance so it's worth having denormalized data for the following usecase:

buyers table with 10.000 entries
products table with 40.000.000 entries
orders (buyer_product) table with 40.000.000 entries
  1. Normalized: SELECT * FROM buyer_product JOIN products ON products.id = buyer_product.product_id LIMIT 1000;

  2. Denormalized (product saved as text instead of product_id) SELECT * FROM buyer_product LIMIT 1000;

EDIT: For some weird reason 1 even seems to be faster:

SELECT SQL_NO_CACHE bp.buyer_id, product.name
FROM buyer_product bp
JOIN products ON bp.product_id = products.id 
ORDER BY bp.id
LIMIT 1000000;

query time 0.250 sec

SELECT SQL_NO_CACHE bp.buyer_id, bp.product 
FROM buyer_product bp 
ORDER BY bp.id 
LIMIT 1000000;

query time 0.268 sec

Chris
  • 13,100
  • 23
  • 79
  • 162

0 Answers0