1

I'm making a fantasy name generator, and I'm having trouble planning my database table. So far my columns are ID(primary key), name, type(first, last, or either), Gender(unisex, male, or female), and then I have a column for each race(Dragonborn, Dwarf, Elf, etc.).

I'm making a PHP form for my friends and me to input names, where the races will be checkboxes, and then it will show up on the public table. I'm having trouble figuring out a way to show all the column names that are not null for each name/row in the Race cell of the public table. Currently if you select the races the database puts 'Yes' in every corresponding race column and leaves the other races null.

Should I just go ahead and make one Race column that can have multiple strings in my database? For example, if a name could be human, halfling, and gnome, the race column entry would be "human, halfling, gnome". Would that be better than having a column for each race? I don't know if it's good design and I've had difficulty searching for similar problems.

Shadow
  • 33,525
  • 10
  • 51
  • 64
Positronic
  • 11
  • 2
  • 2
    *"Should I make a column in MySQL where an entry could be multiple strings?"* - No. Use relational tables / normalize your db. – Funk Forty Niner Apr 17 '17 at 01:36
  • 1
    Start reading on fundamentals of relational databases. You'll find a fact that _"every field in a relation should contain an atomic value. There cannot be any multi-valued attributes."_ From the way I see, in your case what you'll need is **not columns**, but **relations** for each race.. – Romeo Sierra Apr 17 '17 at 01:42
  • @Shadow, in the most strict sense this is probably a dupe, but I'm not sure marking it as such with that particular post is very helpful here: the asker realises there's something not right here, which the other post confirms, but doesn't much talk about how to solve it. Whereas the linked question is about the specifics of why this pattern is bad, the OP is looking for what it would be better replaced by. – Gareth Pulham Apr 17 '17 at 01:50
  • I would suggest a single column for race because it cannot be more than one value at a time. Using multiple true/false columns is better suited for a data value that can have multiple values at a time. – Sloan Thrasher Apr 17 '17 at 02:01

2 Answers2

-1

How often do you expect to update the list of possible races? If you expect that it will be frequent, or that the list of possible races should be considered just like data, then a many-to-many relationship might be considered. That is, you should have a table of races (probably id and name), as well as a table of characters (at least id and name) and then a table of character-race relations (charid, raceid).

When you then have a character that's a human-halfling-elf, you can enter three rows into this character-race table: (characterid, humanraceid), (characterid, halflingid), and (characterid-elfid).

For the more general question of how to better design databases, there is a significant amount of literature available. The topic name for this specific question is that of "cardinality": how you can map the relationships of some number of items to some number of other items. In this case, many characters can each have many races.

Once you understand the cardinality of the problem you are trying to solve, there are well known patterns you can apply almost as second nature to design the table and query it. They're absolutely worth learning!

Gareth Pulham
  • 654
  • 4
  • 10
  • So essentially it'd be a one and done thing. The first time the name is entered will be the only time the race column would be updated. I was originally thinking of making a separate table for the races, but found out that one-to-one table relationships were not recommended, so I scratched that idea. Your suggestion makes it seem like it's not one to one though because of the third table and the addition of race ids. Sorry if this is a dumb question but I'm a bit confused. The name of the character would be the foreign key in the columns of each race? – Positronic Apr 17 '17 at 01:57
  • The relationship is indeed not one to one: each character can have many races, and likewise each race can be an attribute of many characters, hence, a many to many relationship. A one to one relationship would be like each character owning exactly 1 car. For each character, they have exactly 1 car, and each car belongs to exactly 1 character - no pooling vehicles. Since multiple characters can be elves, this means that there's not a one-to-n relationship. Likewise, a character can be multiple races, meaning the other direction is not n-to-1. – Gareth Pulham Apr 17 '17 at 02:03
  • Thank you, that makes a lot of sense. So lets say I have the race table, and the two columns are id and name. Name would be the foreign key/character name in each row, and the id would be 'human' or 'halfling'? – Positronic Apr 17 '17 at 02:08
  • No, the ID would be the primary key of the race, and the name would be the written description ("Human", for example). The character-race relation table ends up simply as two foreign keys to the race ID and character ID. If Peter is character 3 and Human is race 8, then the character-race relation showing Peter the Human would be (3 (peter),8 (human)). Likewise, if halfling was race 1 and elf was race 5, then Paul the Human-halfling-elf (character 6) would have three entries in the character-race table: (6,1 (halfling)), (6,8 (human)), (6, 5 (elf)). – Gareth Pulham Apr 17 '17 at 02:13
  • Ohh okay, that makes perfect sense. I definitely like this solution and it works perfect for the actual name generator. The only problem left is that I'm not sure how to represent it on the private database entry page I set up for my friends. I only want each name to show up once, along with their gender, name type, and then the list of races. I'd ask google but I'm struggling with a way to word the question. Thanks for all your help so far, I definitely appreciate it. – Positronic Apr 17 '17 at 02:27
-1

Having a single column with comma separated lists is a SQL Antipattern. Don't do that.

Bill Karwin gives a pretty good overview of this issue in Chapter 2: Jaywalking in his excellent book, "SQL AntiPatterns: Avoiding the Pitfalls of Database Programming"

(available from Amazon https://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557 and other fine booksellers.)

Only store a comma separated list if the database will never be required to look at that column as anything other than a single value, just one meaningless blob of data. That is, if you will never ask the database to find rows that include "Elf" in the column.

And especially don't do this if you need the database to enforce constraints on the value e.g. if you want the database to recognize that 'Dragonseed', or 'Dwarf,,Dragginborn,Dragonseed,Elf,,' or even 'Dwarf,,Elf,Hobbit,Dwarf,Dwarf' are invalid values. And especially don't do this if you will ever require the database to add or remove or replace a Race in the column e.g. to convert 'Dwarf,Elf,Hobbit' to 'Elf,Hobbit' or 'Dwarf,Ent,Hobbit'.


With that said, if the domain for the values that can be listed is pre-defined and immutable (fixed) in the schema definition, then I might consider using the SET datatype.

Otherwise, I would only store a comma-separate list as a database column, again, if it will only ever be treated as one blob of data. Similar to how I might handle a .jpg image as a BLOB. I will never query for a "match" on the contents of the BLOB, and I will only ever retrieve or store the BLOB in its entirety. (I won't ever be requiring the database to perform an operation that locates/identifies/retrieves/inserts/modifies/removes a portion of the BLOB.)

spencer7593
  • 106,611
  • 15
  • 112
  • 140
  • So it seems like separating them by commas in one column would be a bad idea for me then. My idea was to search for them by substring. So when the generator is finished and the database is full, if someone wanted a list of elf names I'd search for names that had a substring 'elf' in the race column. – Positronic Apr 17 '17 at 02:12
  • @Positronic: and querying for a substring (or with a LIKE comparison) cannot take advantage of efficient index accesses. To do a search for 'mer' while avoding a match to 'orsimer' requires a construct such as `WHERE CONCAT(',',t.col,',') LIKE ',mer,'`, and that expression involving t.col has to be evaluated for *every* row in the table, in order to do the comparison, which can be a significant drag on performance (and scalability) with large sets. We have much more opportunity for performance improvement using indexes with predicates of the form `WHERE t.col = 'mer'`. – spencer7593 Apr 17 '17 at 02:48