0

I am currently dealing with a data structure similar to the one linked here:

http://sqlfiddle.com/#!2/2ad8f/1

There will be a field (fruits in this case) that can contain very variable options - quantity, colour, type, etc. I am trying to work out an efficient way of storing this data and using it programatically in a frontend.

I have thought about creating new fields (e.g. a field for quantity, a field for colour, etc), however the data can be highly variable and I will be dealing with many, many rows. Potentially 1-2 million. I don't want to create a "texture" field for example that is only used for 100/1,000,000 rows.

The "fruits" here would never be order by or referenced by the database storage engine.

My best idea so far is to store a JSON object as a string (see the second insert in link), however is there a more efficient method?

Andrew White
  • 600
  • 1
  • 9
  • 29
  • JSON seems the solution to me too. But before that, please have a look into this http://stackoverflow.com/questions/1299410/can-i-save-an-object-in-a-sql-server-database – vpv Jun 09 '14 at 06:57
  • another option is create separate tables for quantity, colour, etc. then use left join – Fabricator Jun 09 '14 at 07:01
  • @V.P.Verma - oh thanks, I've never come across varbinary and I've just checked - MySQL (my backend) has an implementation for it. This looks quite promising! – Andrew White Jun 09 '14 at 07:02
  • @user3678068 - thanks, I did consider that. Unfortunately there could be literally hundreds of different options, which is why I mentioned it's heavily variable. Additionally with the rows I'm talking about (1-2 million) I suspect the LEFT JOIN would cause havoc in terms of processing power required. – Andrew White Jun 09 '14 at 07:03

1 Answers1

0

If you want to place all your attributes into one text container, you may as well be using a text file instead of a relational database. The database will have a lot of overhead that you are simply not using so why have it?

If you want this in a relational form, then let's go through some simple modeling.

WE have different kinds of fruit. These fruit can have different and even different kinds of attributes. Here is one simple way:

create table Fruit(
    ID int auto_increment primary key,
    Name varchar( 20 ) not null,    -- Apple, Orange, etc.
    Type varchar( 20 ),    -- Macintosh, Granny Smith, Navel, etc.
    Size char( 1 ),        -- S, M, L
    Qty int not null,
    -- other data such as price, shelflife, whatever
);

So now we create a table for each type of disparate attribute:

create table Attr(
    ID int auto_increment primary key,
    Type varchar( 20 ),    -- Color, Texture, Taste, etc.
    Value varchar( 10 )   -- Red, Green, Juicy, Sweet, Sour, etc.
);

Each fruit can have several attributes and each attribute may apply to several kinds of fruit, so you need a many-to-many cross table between them:

create table FruitAttr(
    FruitID int,
    AttrID int,
    primary key( FruitID, AttrID )
);

with FruitID a foreign key to Fruit and AttrID a foreign key to Attr. Now we can create a Basket table which will define each individual basket.

create table Basket(
    ID int auto_increment primary key,
    Name varchar( 20 ) not null,    -- Graduation, Funeral, Birthday, etc.
    Price decimal (19,4),
    -- other basket-specific attributes
);

A basket is made up of several selections of fruit and each fruit may appear in several types of basket. So there is the same relationship between Basket and Fruit as between Fruit and Attr: many-to-many. As we've already modeled one of those tables, I'll leave that to you.

There are enhancement and changes that may be made to tailor these tables closer to your specific uses, but we now have a workable solution.

So very quickly we have gone from one table to five tables. That may seem like we've complicated everything but if you have to work with them, you will find we have made our (meaning your) life a whole lot easier, especially when you add new types of baskets or fruit, change the makeup of a basket, substitute one fruit (severe core rot suddenly makes Granny Smiths unavailable), or any number of ways you will need to change your data.

After all, it is a relational database and relations are established between tables, not between substrings within strings. So the DML and queries to work with these relations will be so much easier than trying to manipulate text strings.

TommCatt
  • 5,498
  • 1
  • 13
  • 20