1

A while ago, I came to the realization that a way I would like to hold the skills for a player in a game would be through CSV format. On the player's stats, I made a varchar of skills that would be stored as CSV. (1,6,9,10 etc.) I made a 'skills' table with affiliated stats for each skill (name, effect) and when it comes time to see what skills they have, all I have to do is query that single column and use PHP's str_getcsv() to see if a certain skill exists because it'll be in an array.

However, my coworker suggests that a superior system is to have each skill simply be an entry into a master "skills" table that each player will use, and each skill will have an ID foreign key to the player. I just query all rows in this table, and what's returned will be their skills!

At first I thought this wouldn't be very good at all, but it appears the Internet disagrees. I understand that it's less searchable - but it was not my intention to ever say, "does the player have x skill?" or "show me all players with this skill!". At worst if I wanted such data, I'd just make a PHP report for it that would, admittedly, be slow.

But it appears as though this is really faster?! I'm having trouble finding a hard answer extending beyond "yeah it's good and normalized". Can Stack Overflow help me out?

Edit: Thanks, guys! I never realized how bad this was. And sorry about the dupe, but believe me, I didn't type all of that without at least checking for dupes. :P

Seth Goodwin
  • 107
  • 1
  • 1
  • 5

3 Answers3

4

Putting comma-separated values into a single field in a database is not just a bad idea, it is the incarnation of Satan expressed in a database model.

It cannot represent a great many situations accurately (cases in which the value contains a comma or something else that your CSV-consuming code has trouble with), often has problems with values nested in other values, cannot be properly indexed, cannot be used in database JOINs, is difficult to dedupe, cannot have additional information added to it (number of times the skill was earned, in your case, or a skill level), cannot participate in relational integrity, cannot enforce type constraints, and so on. The list is almost endless.

This is especially true of MySQL which has the very convenient group_concat function that makes it easy to present this data as a comma-separated string when needed while still maintaining the full functionality and speed of a normalized database.

You gain nothing from using the comma-separate approach but lose searchability and performance. Get Satan behind thee, and normalize your data.

Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
2

Well, there are things such as scaleability to consider. What if you need to add/remove a skill? How about renaming a skill? What happens if the number of skills out grows the size of your field? It's bad practice to have to re-size a field just to accommodate something like this.

What about maintainability? Could another developer come in and understand what you've done? What happens if the same skill is given to a player twice?

You coworker's suggestion is not correct either. You would have 3 tables in this case. A master player table, a skills table, and a table that has a relationship to both, creating a many to many relationship, allowing a single skill to be associated with many players, and many players having the same skill.

Darthg8r
  • 12,377
  • 15
  • 63
  • 100
  • IMHO: I believe the poster's intent was to have (essentially) a Player record containing a number of values representing Skill Levels ­— i.e. fixnum types, not booleans — so the third table would have player_id:skill_id:level (int) mappings, or (if all players have all skills) the two-table design might use a column for each skill (of int, or appropriate, type). Definitely agree with @Darthg8r's intent, but I think s/he may have misinterpreted the intent slightly? – BRPocock Nov 29 '11 at 23:30
2

Since the database will index the content (assuming that you use index) it will be very very fast to search the content and get the desired contents. Remember: databases are designed to hold a lot of information and a database such as mysql, which is a relational database, is made for relations.

Another matter is the maintainability of the system. It will be much much easier to maintain a system that's normalized. And when you are to remove or add a skill it will be easier.

When you are about to get the information from the database regarding the skills of the player you can easily get information connected to the concerned skills with a simple JOIN.

I say: Let the database do what it does best - handle the data. And let your programming do what it should do ;)

Marcus
  • 12,296
  • 5
  • 48
  • 66