When to split up tables in SQL?

Question

I'm relatively new to SQL and trying to teach myself and I'm having a hard time understanding when to keep a column and when to separate it into a new table.

I was watching a lecture where the instructor had a 'Customer' table and one of the columns was 'City' and a lot of the customers were from the same city so the data was redundant. He then broke off 'City' into its own table but that didn't quite make sense to me.

For example, I'm creating a College Course DB and I noticed that certain columns in the 'Course' dimension are repeating very often (like credit hours). Should I break credit hours off into its own table where that table would only have a couple of rows? What does this accomplish? I would still have to use a foreign key to reference the same value for every new data entry so would it even save on storage or would it just be an unnecessary join?

I have other columns as well like 'Days of Week', 'Location', 'Class Time' which also only have a few values that repeat often. Should those be broken off into their own separate tables or be left part of the Course table?

I'm not a database expert but from my experience - Sometimes its better to repeat some information in same table's columns! That will avoid trouble when you will have millions of records and have to join 2/3 tables to extract data for a particular report. — N Subedi, Sep 17 '18 at 14:37
It sounds like you haven't read anything on database normalization (https://en.wikipedia.org/wiki/Database_normalization). Typically a database is normalized to the third normal form (fancy words for steps). There are a lot of tutorials and explanations out there on the web to help. :-) As you get more adept at doing this you will learn that sometimes it might make sense to "bend" these rules for a situation. I recommend sticking with doing the 3 steps until you understand it enough to know when it's ok to bend the rules. Too many bend/break the rules out of lack of knowledge. — Kevin LaBranche, Sep 17 '18 at 14:44
Short answer - it depends. Long answer - it depends on a lot of stuff. What is the intent of this database? Are you trying to make it it easier to get data in, or get data out? Is space an issue? Is this an operational database, a reporting database, etc? — Andrew, Sep 17 '18 at 15:01
@NirmalSubedi, I would love see a scenario where it's "better to repeat some information in same table's columns". — Brian, Sep 17 '18 at 15:24
Hi. Time to read a textbook on information modeling, the relational model & database design. This is too broad a question. (Dozens of published academic textbooks are free online, also slides & courses.) PS [Understanding Normalization & Duplicates - I Guess I Don't - Adding Artist & Title Ids](https://stackoverflow.com/a/44539858/3404097) PS Info on a tool to manage designs does not constitute info on how to design. — philipxy, Sep 18 '18 at 01:05

score 2 · Accepted Answer · answered Sep 17 '18 at 15:07

This is always tricky when you are learning databases. The rules of normalization can help, but they can be unclear on when to apply them.

The idea is that (some) database tables represent "entities". These are things you want to store information about. Other tables represent relationships between/among entities, but let's not worry about those for now.

For your specific questions:

"credit hour seem more like an attribute of the course entity. When would they be their own entity? Well, if they had other information specific to being credit hours. It is hard to come up with examples, but for instance: cost, range of effective dates, departments where the credits apply.
"days-of-weeks". If this is for a date, then just use a date and derive the day of the week using database functions for the date or a calendar table.
"days-of-weeks" for scheduling. This one is trickier. There are multiple ways to represent this; the best representation depends on how it is being used.
"location". This sounds like an entity. It could have a name, address, contact, directions and other information. In fact, there could be more than one entity to support this.
"class time". This is probably an attribute of the course, with a start time and end time.

score 0 · Answer 2 · answered Sep 17 '18 at 16:21

Think about using a Course table and a schedule table. This way you can have one course with many different schedules having different times and days of the week. If there are different locations I would move the location into the schedule table. Format the times correctly so you can calculate duration and cast them if needed. It depends on what the data looks like.

course

PK course_ID (int)
credit_hours (int)
location (varchar)
```
  |
```
(one to many relationship)
```
  |
```

schedule

PK schedule_ID (int)
FK course_ID (int)
day_of_week (varchar)
start_time (varchar)
end_time (varchar)

When to split up tables in SQL?

2 Answers2