2

I'm not sure of the header question. Just to elaborate.

I have Table Users and have columns:

  1. Name (example values: "Meg", "Sam", etc.)
  2. Type (example values: "Admin", "Operator", "Maintainer")
  3. Image (example values: "12344.jpg", etc.)
  4. Etc.


I once told by someone that I must create a Table User_type with a seeded values.
  1. Admin [primary id: 1]
  2. Operator [primary id: 2]
  3. Maintainer [primary id: 3]

And change the column Type to Type_id and reference it to User_type Table. And instead of inserting string values ["Admin", "Operator", etc]. I now insert integer 1 for Admin, 2 for Operator, 3 for maintainer.


I don't remember his explanation of this. Now I'm confused.
  • Is this really essential in database matter?
  • Can somebody explain or elaborate me what is the logic of this?
  • String values vs Integer values, whats the matter?

Hope somebody can enlighten me, I am a beginner in database design.

schutte
  • 1,949
  • 7
  • 25
  • 45
  • 1
    No, you don't *have* to do that. You *can* do that. It would allow you to more easily present a list of selectable types when building UIs to input such data, but you can do that without the table as well. So no, this is not essential, there's no *must* with this. You will, however, probably find that most database designers will favor that particular design, to avoid having codes-in-text-fields. For instance, can I add a user with a type of 'Administrator'? – Lasse V. Karlsen Jul 14 '19 at 18:58
  • @LasseVågsætherKarlsen querying Strings vs Integers, I mean comparing Strings vs Integers does have a difference? that is my concern also. – schutte Jul 14 '19 at 19:03
  • 2
    I'm not sure what you meant by that comment. Yes, there is a difference, positive and negative. There is no right or wrong answer here. If someone is telling you that "you must", then ask that person. It may be company policy to do what you're being asked to do. – Lasse V. Karlsen Jul 14 '19 at 19:34
  • 1
    *"I mean comparing Strings vs Integers does have a difference? that is my concern also. "* Yes there is a difference, INT's are generally smaller also strings in MySQL have a collate with rules they tend to have more "overhead" vs INTs – Raymond Nijland Jul 14 '19 at 20:22
  • 1
    This is a faq. Before considering posting please always google any error message & many clear, concise & precise phrasings of your question/problem/goal, with & without your particular strings, names & line numbers & then read many answers. If you post a question, use one phrasing as title. See [ask] & the voting arrow mouseover texts. – philipxy Jul 15 '19 at 00:56
  • Possible duplicate of [Strings as Primary Keys in SQL Database](https://stackoverflow.com/questions/517579/strings-as-primary-keys-in-sql-database) – philipxy Jul 15 '19 at 00:59

2 Answers2

2

You're right to question this requirement, since apparently your colleague didn't explain clearly why it was so important.

The fact is there is no rule in database normalization that requires that we use a pseudokey, that is an integer like your example of Type_id.

It's a good idea to create a lookup table of User_type so you have a concise list of the user types allowed by your system. If you add a new type, you only need to INSERT a row to that lookup table, and then the Users table that references the lookup table will be allowed to use the new type.

But there's no reason dictated by rules of normalization that the reference has to be to an integer. You can make the primary key of that lookup table a string.

CREATE TABLE User_type (
  Type VARCHAR(10) NOT NULL PRIMARY KEY
);

CREATE TABLE Users (
 ...
 Type VARCHAR(10) NOT NULL,
 FOREIGN KEY (Type) REFERENCES User_type(Type)
);

So why did your "someone" tell you that you have to use an integer Type_id?

Well, there could be reasons, but they're not due to rules of normalization.

  • INTEGER may be smaller than the string stored in VARCHAR(10), so the data will take up less space. In MySQL, an INTEGER is four bytes. A VARCHAR is variable-length, depending on the string you store. For example "Admin" takes five bytes, plus one byte to encode the length of the string. So it's at least 50% larger than an INTEGER, and longer strings take even more space. But this is probably not worth optimizing, unless you store hundreds of millions of users.
  • Comparing integer to integer is slightly faster than comparing strings (especially if the string has a character set other than binary), but this is premature optimization. Are you going to run these comparisons so frequently that you need to shave off a few nanoseconds? Maybe, but you haven't said so.
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
0

What you're doing is correct but it will increase the data redundancy means you're using same values again and again in that column. So for this case you can use normalization and separate the data in column. there are many advantage of that. Like if you want to change "Admin" to "Administrator" later then you need to update only one row.

https://en.wikipedia.org/wiki/Data_redundancy

https://en.wikipedia.org/wiki/Database_normalization

Data Redundancy and Database Normalization are basic concept of the RDBMS.

Ashok Gadri
  • 520
  • 4
  • 11
  • Yes.. first thing i have mentioned is he is doing it correct... It depends on situation. Size of the table and so many things are need to keep in mind.. Like if you're searching on that column and want to index it then it's better to keep it as integer. – Ashok Gadri Jul 14 '19 at 20:37
  • Good point Sir. – schutte Jul 14 '19 at 20:42
  • Normalization does not involve replacing values by other values that happen to be ids. Clearly it also it increases redundancy because the exact pattern of values is there plus another table you didn't need mapping ids to original values. – philipxy Jul 16 '19 at 01:40
  • Normalization concept is based on two things.. one to remove the data anomalies and data redundancy. When we're creating extra table for reducing the redundancy we will use integer columns to do that. so that will save memory.. Data redundancy is reduced by eliminating repeating groups in individual table.. Can you please explain how it is increases the redundancy? – Ashok Gadri Jul 16 '19 at 03:44