Redundant data or two queries?

Question

Take, for instance, a table that holds credit card charges, a table that holds credit cards, and a table that holds users.

Each charge is associated with exactly one card, and each card is associated with exactly one user. A user may have multiple cards on file.

If I were to hold this data in a MySQL database in three distinct tables, like so:

charges:

---------------------------------------------
id | card | amount | description | datestamp
---------------------------------------------
5  | 2    | 50.00  | Example     | 1369429422

cards:

------------------------------------------------------------------
id | user | name       | number           | cvv2 | exp_mm | exp_yy
------------------------------------------------------------------
2  | 1    | Joe Schmoe | 4321432143214321 | 555  | 1      | 16

users:

-------------------------------------------
id | first_name | last_name | email
-------------------------------------------
1  | Joe        | Schmoe    | joe@schmoe.co

Now, let's say that I want to access a user, given a charge. In order to get to the user, I would first have to look up the card associated with the charge, and then look up the user associated with the card. Obviously, in an example like this, speed would be negligible. But in other scenarios, I see this as being two queries.

However, if I stored the data like so:

charges

----------------------------------------------------
id | card | user | amount | description | datestamp
----------------------------------------------------
5  | 2    | 1    | 50.00  | Example     | 1369429422

then the charge would be associated directly with a user. That, however, is redundant information, since that same data is stored in the cards table.

Thoughts?

Why don't you use `JOIN` to access data you want on `cards` table? — A. Cristian Nogueira, May 24 '13 at 20:18
Is your database being used for transactional updates or for analysis? If it is for updates, then normalization (three tables) is usually better. For analysis, you might consider putting more information into a single table. — Gordon Linoff, May 24 '13 at 22:19

score 1 · Accepted Answer · edited May 23 '17 at 12:18

1

Your instinct not to include the user information in the charges table is correct; however, it's still only one query:

select first_name, last_name, email
from users, cards, charges
where users.id = cards.user
and cards.id = charges.card
and charges.id = 5;

That would give you the user info for the charge with id 5. This is the exact thing that relational databases are best at :) This kind of thing is called a "join" because it joins multiple tables together to give you the information you need. There are multiple ways to write this query.

As an aside, perhaps this is a contrived example, but if this is an application you are writing from scratch, there are lots of reasons to avoid storing credit cards in your own database. Often a payment processor can handle the details for you while still allowing you to charge credit cards. More info.

edited May 23 '17 at 12:18

Community

1
1

answered May 24 '13 at 20:35

Whit Kemmey

2,230
22
18

Thanks. It was just an example. By the way, thanks for this syntax, which I haven't seen before; it's much clearer than JOIN syntax. Incidentally though, logically, wouldn't it take longer to process a JOIN query than a traditional query, because the MySQL system is just doing behind the scenes what I would've done with two queries? – M Miller May 24 '13 at 23:43
A join performs quicker than multiple queries as it is one atomic transaction, and there is less overhead, such as network transfer. This performance difference may be negligible in this case. You can see for yourself by performing some benchamrks on your application, trying both solutions separately to see which executes faster collectively. – Erin Schoonover May 25 '13 at 01:00

score 1 · Answer 2 · answered May 24 '13 at 20:37

1

You could denormalize by adding the user id to the charges table. You need to know if that is necessary, given the expected size of the tables. If this optimization is warranted, use it. If you don't know, then optimize in the future as necessary.

As it stands, you don't need two queries though

SELECT users.* FROM charges
JOIN cards ON charges.card = cards.id
JOIN users ON cards.user = users.id
WHERE charges.id = ?

answered May 24 '13 at 20:37

Erin Schoonover

530
2
13

It is rarely correct to denormalize the data in this way. Unless you have a really good reason to store the user id in the charges table, don't do it! – Whit Kemmey May 24 '13 at 20:58

Redundant data or two queries?

2 Answers2