2

I am trying to make a simple item database using MySQL for a game. Here is what my 3 tables would look like

     items
     itemId | itemName 
    -------------------
     0001   | chest piece
     0002   | sword
     0003   | helmet
     attributes (attribute lookup table)
     attributeId       | attributeName
     ---------------------------------
     01             | strength
     02             | agility
     03             | intellect
     04             | defense
     05             | damage
     06             | mana
     07             | stamina
     08             | description
     09             | type
     item_attributes (junction table)
     itemId  | attributeId   | value (mixed type, bad?)
     ------------------------------------
     0001   | 01            | 35
     0001   | 03            | 14
     0001   | 09            | armor
     0001   | 08            | crafted by awesome elves

     0002   | 09            | weapon
     0002   | 05            | 200
     0002   | 02            | 15
     0002   | 08            | your average sword

     0003   | 04            | 9000
     0003   | 09            | armor
     0003   | 06            | 250

My problem with this design is that value column in item_attributes table needs to use varchar data type, since the value's data can be int, char, varchar. I think this is a bad approach because I would not be able to quickly sort my items based on particular attributes. It would also suffer performance hit when a query such as get items with attribute strength that has value between 15 and 35 is processed.

Here is my potential fix. I simply added a data_type column to the attributes table. So it would look something like this

     attributes (attribute lookup table)
     attributeId       | attributeName | data_type
     ---------------------------------------------------
     01             | strength      | int
     09             | type          | char
     08             | intellect     | varchar

Then I would add 3 more columns to item_attributes table, int, char, varchar. Here is how the new item_attributes table would look like.

     item_attributes (junction table)
     itemId  | attributeId   | value                 | int  | char | varchar
     ------------------------------------------------------------------------
     0002   | 09            | weapon                | null |weapon| null
     0002   | 05            | 200                   | 200  | null | null
     0002   | 02            | 15                    | 15   | null | null
     0002   | 08            | your average sword    | null | null | your average sword

So now if I were to sort items based on its strength attribute, I would use int column. Or search for an item based on its description, I would search the varchar column.

I still, however, believe my design is a bit weird. Now I would have to look up the data_type column in attribute table and dynamically determine which column in item_attributes table is relevant to what I am looking for.

Any inputs would be greatly appreciated.

Thanks in advance.

EDIT 11/29/2010 Here is a detailed list of my items

--------------------------------------
http://wow.allakhazam.com/ihtml?27718
Aldor Defender's Legplates
Binds when picked up
LegsPlate
802 Armor
+21 Strength
+14 Agility
+21 Stamina
Item Level 99
Equip: Improves hit rating by 14.
--------------------------------------
http://wow.allakhazam.com/ihtml?17967
Refined Scale of Onyxia
Leather
Item Level 60
--------------------------------------
http://wow.allakhazam.com/ihtml?27719
Aldor Leggings of Puissance
Binds when picked up
LegsLeather
202 Armor
+15 Agility
+21 Stamina
Item Level 99
Equip: Increases attack power by 28. 
Equip: Improves hit rating by 20.
--------------------------------------
http://wow.allakhazam.com/ihtml?5005
Emberspark Pendant
Binds when equipped
NeckMiscellaneous
+2 Stamina
+7 Spirit
Requires Level 30
Item Level 35
--------------------------------------
http://wow.allakhazam.com/ihtml?23234
Blue Bryanite of Agility
Gems
Requires Level 2
Item Level 10
+8 Agility
--------------------------------------
http://wow.allakhazam.com/ihtml?32972
Beer Goggles
Binds when picked up
Unique
HeadMiscellaneous
Item Level 10
Equip: Guaranteed by Belbi Quikswitch to make EVERYONE look attractive! 
--------------------------------------
http://wow.allakhazam.com/ihtml?41118
Gadgetzan Present
Binds when picked up
Unique
Item Level 5
"Please return to a Season Organizer"
--------------------------------------
http://wow.allakhazam.com/ihtml?6649
Searing Totem Scroll
Unique
Quest Item
Requires Level 10
Item Level 10
Use:  
--------------------------------------
http://wow.allakhazam.com/ihtml?6648
Stoneskin Totem Scroll
Unique
Quest Item
Requires Level 4
Item Level 4
Use:  
--------------------------------------
http://wow.allakhazam.com/ihtml?27864
Brian's Bryanite of Extended Cost Copying
Gems
Item Level 10
gem test enchantment
--------------------------------------

EDIT #2

  1. These 10 examples are not representative of all 35316 items data that I have collected.
  • NeckMiscellaneous means that item is in both categories of `Neck` and `Misc`.
  • Unique means the only one item can be used on character.
  • Don’t read too much into the “Action”, they are just quest description
  • When an item says `Equip: increase attack power by 28` it just means +28 attack power on the player character. It is the same as +15 agility.
  • There are a total of 241884 one-to-many item-attribute records, so that comes about to 241884/35316 ~= 8 average attributes per item. Also the data is mined from the website into a gigantic text file. There is NO “well formed” information to identify an item’s type or category. So if the word “sword” appears on either 3rd or 4th line, it is automatically categorized as sword.
  • The item might get changed on each new update of the game.
  • There is no universal attribute shared amongst the item besides `name`
  • The item data is accessible through a web app. Unclear about what you mean by bits and vectors?
  • The regular expression is used during data mining stage to clean up the special character and search for specific keyword in order to categorize the items. Also to extract attribute name and value. For example, +15 agility would have string agility extracted as attribute name and 15 as value. (I don’t understand much about question 6 and 6.1. Slog stands for server log here? Translate regexes to SQL?)
  • Model Diagram

    Here is an example how a query looks like

    select *
    from itemattributestat
    where item_itemId=251
    
    item_itemId | attribute_attributeId | value | listOrder
    =======================================================
    '251', '9', '0', '1'
    '251', '558', '0', '2'
    '251', '569', '0', '3'
    '251', '4', '802', '4'
    '251', '583', '21', '5'
    '251', '1', '14', '6'
    '251', '582', '21', '7'
    '251', '556', '99', '8'
    '251', '227', '14', '9'
    

    The list order is here to keep track of which attribute should be listed first. For formatting purpose

    create view itemDetail as
    select Item_itemId as id, i.name as item, a.name as attribute, value
    from ((itemattributestat join item as i on Item_itemId=i.itemId) 
        join attribute as a on Attribute_attributeId=a.attributeId)
    order by Item_itemId asc, listOrder asc;
    

    The above view produces the following with

    select *
    from itemdetail
    where id=251;
    id | item | attribute | value
    '251', 'Aldor Defender''s Legplates', 'Binds when picked up', '0'
    '251', 'Aldor Defender''s Legplates', 'Legs', '0'
    '251', 'Aldor Defender''s Legplates', 'Plate', '0'
    '251', 'Aldor Defender''s Legplates', 'Armor', '802'
    '251', 'Aldor Defender''s Legplates', 'Strength', '21'
    '251', 'Aldor Defender''s Legplates', 'Agility', '14'
    '251', 'Aldor Defender''s Legplates', 'Stamina', '21'
    '251', 'Aldor Defender''s Legplates', 'Item Level', '99'
    '251', 'Aldor Defender''s Legplates', 'Equip: Improves hit rating by @@.', '14'
    

    An attribute with value 0 means the attribute name represents the item type. 'Equip: Improves hit rating by @@.', '14' @@ is place holder here, a processed output on a browser will be 'Equip: Improves hit rating by 14.'

    nobody
    • 2,709
    • 6
    • 35
    • 37

    3 Answers3

    3

    Why do you have an attribute table ?

    Attributes are columns, not tables.

    The website link tells us nothing.

    The whole idea of a database is that you join the many small tables, as required, for each query, so you need to get used to that. Sure, it gives you a grid, but a short and sweet one, without Nulls. What you are trying to do is avoid tables; go with just one massive grid, which is full of Nulls.

    (snip)

    Do not prefix your attribute names (column names) with the table name, that is redundant. This will become clear to you when you start writing SQL which uses more than one table: then you can use the table name or an alias to prefix any column names that are ambiguous.

    The exception is the PK, which is rendered fully, and used in that form wherever it is an FK.

    Browse the site, and read some SQL questions.

    After doing that, later on, you can think about if you wantstrength and defense to be attributes (columns) of type; or not. Et cetera.

    Responses to Comments 30 Nov 10

    .
    Excellent, you understand your data. Right. Now I understand why you had an Attribute table.

    1. Please make sure those 10 examples are representative, I am looking at them closely.

      • Type:Gem Name:Emberspark Pendant ... Or, is NeckMiscellaneous a type ?
      • Is Unique a true ItemType ? I think Not
      • Action.Display "Please return to a Season Organizer"
      • Where are the Attrinutes for AttackPower and HitRating ?
        .
    2. How many different types of items (of 35,000) are there, ala my Product Cluster example. Another way of stating that question is, how many variations are there. I mean, meaningfully, not 3500 Items ÷ 8 Attributes ?

    3. Will the item_attributes change without a release of s/w (eg. a new Inner Strength attribute) ?

    4. Per Item, what Attributes are repeating (more than one); so far I see only Action ?

    5. It is a game, so you need a db that is tight and very fast, maybe fully memory resident, right. No Nulls. No VAR Anything. Shortest Datatypes. Never Duplicate Anything (Don't Repeat Yourself). Are you happy with bits (booleans) and vectors ?

    6. Do you need to easily translate those regexes into SQL, or are you happy with a serious slog for each (ie. once you get them working in SQL they are pretty stable and then you don't mess with it, unless you find a bug) (no sarcasm, serious question) ?

      6.1 Or maybe it is the other way round: the db is disk-resident; you load it into memory once; you run the regexes on that during gameplay; occasionally writing to disk. Therefore there is no need to translate the regexes to SQL ?

    Here's a Data Model of where I am heading, this not at all certain; it will be modulated by your answers. To be clear:

    • Sixth Normal Form is The Row consists of the Primary Key and, at most, one Attribute.

    • I have drawn (6.1) not (6), because your data reinforces my belief that you need a pure 6NF Relational database

    • My Product Cluster Data Model, the better-than-EAV example, is 6NF, then Normalised again (Not in the Normal Form sense) by DataType, to reduce no of tables, which you have already seen. (EAV people usually go for one or a few gigantic tables.)

    • This is straight 5NF, with only the 2 tables on the right in 6NF.

    Link to Game Data Model

    Link to IDEF1X Notation for those who are unfamiliar with the Relational Modelling Standard.

    Response to Edit #2 05 Dec 10

    1.1. Ok, corrected.

    1.2. Then IsUnique is an Indicator (boolean) for Item.

    1.3. Action. I understand. So Where are you going to store it ?

    1.4. NeckMiscellaneous means that item is in both categories of Neck and Misc. That means two separate Item.Name=Emberspark Pendant, each with a different Category.

    .
    2. and 5. So you do need fast fast memory-resident db. That's why I am trying to get you across the line, away from GridLand, into RelationalLand.
    .
    3. Ok, we stay with Fifth Normal Form, no need for 6NF or the Product Cluster (tables per Datatype). Sofar the Values are all Integers.
    .
    4. I can see additionally: Level, RequiredLevel, IsUnique, BindsPickedUp, BindsEquipped.
    .
    5. Bits are booleans { 0 | 1 }. Vectors are required for (Relational) projections. We will get to them later.
    .
    6. Ok, you've explained, You are not translating regular expressions to SQL. (Slog means hard labour). .
    7. What is Category.ParentId ? Parent Category ? That has not come up before.
    .
    8. Attribute.GeneratedId ?

    Please evaluate the Data Model (Updated). I have a few more columns, in addition to what you have in yours. If there is anything you do not understand in the Data Model, ask a specific question. You've read the Notation document, right ?

    I have Action as a table, with ItemAction holding the Value:
    Equip: increase attack power by 28 is Action.Name=Increase attack power by and ItemAction.Value=28.

    PerformanceDBA
    • 32,198
    • 10
    • 64
    • 90
    • @PerformanceDBA: Come to think of it, it seems using item_stat would have been a better choice then item_attribute. The reason I can't use your suggestion because the number of stat varies from item to item. Also as Larry Lustig pointed out, I need to prepare for adding new stats to items in the future, or removing old stats as well. – nobody Nov 28 '10 at 15:19
    • 1
      @Qin. 1) So what exactly is stopping you from adding an item_stats table, to this design ? Nothing is stopping the number of stats from varying (it cannot vary in your grid, but I am giving you a Relational solution). One-to-many tables are normal in an Rdb. If you post an example of your item stats, I can respond fully. 2) If you go with EAV, read [this question and answer](http://stackoverflow.com/questions/4283842/database-schema-related-problem/4283979#4283979) first. – PerformanceDBA Nov 29 '10 at 05:06
    • Thanks for the link, I am slowly digesting the model diagram you provided for Bram's question. I have to admit this will be my first ever SQL db developing experience. The thought of splitting item value into different tables by its data type did come up for me before, but I thought it would have been too much hassle work with. After seeing your diagram, I am inclined to try it out. – nobody Nov 30 '10 at 01:38
    • 1
      @Qin. At this stage, I see no reason to go for EAV, Rdb looks fine for your purpose (you need to post the requested details); I was saying, if you do, then do it properly, without losing control. Feel free to post the info or to change your choice of Answer. – PerformanceDBA Nov 30 '10 at 03:27
    • I edited my question to include 10 items with for requested details, . There are over 35000 items like that needs to be inserted into my database. Each attribute and attribute value will be parsed using regular expression. I think I still have 20 or more regular expression to compose. – nobody Nov 30 '10 at 04:47
    • 1
      @Qin. Answered in my post. Over to you. Also, let me know if Alla Khazam is the correct name for the game or not. – PerformanceDBA Nov 30 '10 at 13:13
    • @PerformancedDBA Thank you so much for the information and model diagram. As always it is hard to digest for a beginner like me. My answer to your question is in the original post starting from EDIT #2 – nobody Dec 03 '10 at 03:39
    • 1
      @Qin. Answered in my post. Over to you. – PerformanceDBA Dec 05 '10 at 14:05
    2

    I think having the data_type column just further complicates the design. Why not simply have type and description be columns on the items table? It stands to reason that every item would have each of those values, and if it doesn't then a null would do just fine in a text column.

    You can even further normalize the type by having an item_types table and the type column in items would be a numeric foreign key to that table. Might not be necessary, but might make it easier to key off of the type on the items table.

    Edit: Thinking about this further, it seems like you may be trying to have your data tables match a domain model. Your items would have a series of attributes on them in the logic of the application. This is fine. Keep in mind that your application logic and your database persistence layout can be different. In fact, they should not rely on each other at all at a design level. In many small applications they will likely be the same. But there are exceptions. Code (presumably object-oriented, but not necessarily) and relational data have different designs and different limitations. De-coupling them from one another allows the developer to take advantage of their designs rather than be hindered by their limitations.

    David
    • 208,112
    • 36
    • 198
    • 279
    • Thanks for the quick response, the idea of adding somewhat universal attribute, such as `type` and `description` to the `items` table is something I haven't thought about. Perhaps I should have included more detail about my design intention. The problem is if I add `sold_by` and `dropped_by` both non-int attribute to an item, then I need to alter the `item` table. I am trying make a design that can facilitate future change in item's attribute. – nobody Nov 27 '10 at 01:12
    • @Qin: Should `sold_by` and `dropped_by` even be in these tables? Maybe keep a table of "sales" and one of "drops" each of which has a foreign key to `items`, timestamp, information about the sale/drop, and a primary ID of course. For any given item you can query those tables for its most recent sale/drop/etc. pretty easily. That way you also have a running log of sales/drops so that data isn't lost. The application can have the attribute as a single property on the item class, but the database doesn't need to. – David Nov 27 '10 at 01:17
    • @Qin: You may also want to look into document databases such as MongoDB for your needs. It's set up very differently from a relational database and accommodates a changing schema very well. Each item can simply be a "document" in the database with whatever attributes on it you wish. – David Nov 27 '10 at 01:20
    • Thanks for the all advices. My site will be deploy on a shared-hosting environment, so my only choice for database is MySQL. Your comment on where to include `sold_by` and `dropped_by` has gave me new insight on how I should define the relationship between my items and item_attribute. Instead of having `description` as an attribute, ill just use the value of `description` such as `your average sword` to be the attribute and assign it a value of 0. Of course my attribute table would have to get bigger right now. – nobody Nov 28 '10 at 03:27
    • Btw I have to apologize that I wasn't very forthcoming about the functionality I wanted to implement. You can visit http://www.wowhead.com/items to see what I am trying to do. – nobody Nov 28 '10 at 03:28
    0

    You are dealing with a two common problems:

    • Entities that are similar to each other but not identical (all items have a name and description, but not necessarily an intellect).
    • A design in which you need to add attributes once the database is in production (you can pretty easily predict that at some point you'll need to add, for instance, a magic-resistance attribute to some items).

    You've solved your problem by reinventing the EAV system in which you store both attribute names and values as data. And you've rediscovered some of the problems with this system (type checking, relational integrity).

    In this case, I'd personally go with a solution midway between the relational one and the EAV one. I'd take the common columns and add them, as columns, to either the items table or, if items represents kinds of items, not individual ones, to the items_owners table. Those columns would include description and possibly type and in the example you gave, pretty much match up with the text columns. I'd then keep the existing layout for those attributes that are numerical ratings, making the value type int. That gives you type-checking and proper normalization across the integer attributes (you won't be storing lots of NULLs) at the expense of the occasional NULL type or description.

    Larry Lustig
    • 49,320
    • 14
    • 110
    • 160
    • After googling for what EAV is, I think it might be what I am looking for. – nobody Nov 28 '10 at 15:19
    • Full EAV costs you most of the benefits of using a relational database system. It's best applied when you're using an object persistence layer between your code and your database which will do the type translation and checking for you. If you're going to take that approach it may be worth investigating non-SQL solutions for your data storage needs (such as object databases). – Larry Lustig Nov 28 '10 at 15:25
    • Since the only database hosting I have access to is MySQL, I think I will stick with a SQL solution. Besides, there are other areas of my database that would benefit from a relational design. Btw, can you elaborate a little bit on "using an object persistence layer", I am not sure if I am reading the right stuff returned by searching google. – nobody Nov 28 '10 at 16:13
    • By an object persistence layer I mean a library or feature in your programming language of choice that "knows" how to store away language objects (class instances) and retrieve them again. There are different ways of implementing this kind of feature, one way is to use a database with a giant EAV table. If your using some product that does the translation to and from the EAV representation for you then you can treat it as an object store. It's not fun to do this by yourself, however. – Larry Lustig Nov 28 '10 at 16:41
    • 1
      EAV itself takes nothing away from Relational db capability, security, integrity, standards. Of course the work of amateurs should not be taken as an example. I have worked with a few commercial EAV dbs produced by professionals, where there was no loss of integrity or relational power. You may wish to read this [recent question and answer](http://stackoverflow.com/questions/4283842/database-schema-related-problem/4283979#4283979), the DM explains how to maintain relational capability and DRI; the answer content explains how to avoid losing any integrity or control. – PerformanceDBA Nov 29 '10 at 05:02