I have a legacy DB which uses a table similar to this structure:
orders
id, buyer_id, product
the table saves the product as text instead of referencing to the products table (which exists). The person who setup the table explained his reasoning:
The products are quite unique. i.e. There are only 2 or 3 buyers for each different product. The product table is therefore almost as big as the orders table. He wants to avoid a join to speed up the queries on the orders table.
Does this actually improve performance so it's worth having denormalized data for the following usecase:
buyers table with 10.000 entries
products table with 40.000.000 entries
orders (buyer_product) table with 40.000.000 entries
Normalized: SELECT * FROM buyer_product JOIN products ON products.id = buyer_product.product_id LIMIT 1000;
Denormalized (product saved as text instead of product_id) SELECT * FROM buyer_product LIMIT 1000;
EDIT: For some weird reason 1 even seems to be faster:
SELECT SQL_NO_CACHE bp.buyer_id, product.name
FROM buyer_product bp
JOIN products ON bp.product_id = products.id
ORDER BY bp.id
LIMIT 1000000;
query time 0.250 sec
SELECT SQL_NO_CACHE bp.buyer_id, bp.product
FROM buyer_product bp
ORDER BY bp.id
LIMIT 1000000;
query time 0.268 sec