-1

Possible Duplicate:
Is storing a delimited list in a database column really that bad?

I have been working on a couple of PHP/MySQL projects where all relationships are stored as comma separated strings.

For example a common relationship would be like

(in psuedocode)

table people
id - integer
name - string
age - integer
teams - string (CSV OF integers, ex '1,3,9,21')

table teams
name - String
id - integer

managing relationships becomes a hassle.

To get all teams for a person:

$person = 'SELECT * FROM People WHERE id= x';

then in php I have been doing something like

$person['teams'] = SELECT * FROM teams WHERE id IN ($person['teams']);

as I was writing this i realized i could probably combine them in a mysql query, something like:

SELECT 
  people.id, 
  people.name, 
  people.teams, 
  teams.name 
FROM people 
JOIN teams ON FIND_IN_SET(teams.id, people.teams) WHERE people.id=x

with this type of setup I find myself using FIND_IN_SET, pretty frequently

So finally, my question is: Is there a performance benefit to creating relationships like this?

In my experiences so far FIND_IN_SET has usually been doing a full table scan. If there is no performance benefit, in which instances is it beneficial to using a comma seperated list of integers? It seems that mysql designers had something in mind when creating FIND_IN_SET.

dm03514
  • 54,664
  • 18
  • 108
  • 145

2 Answers2

6

You're right, FIND_IN_SET() cannot make use of an index, so it causes a full table scan. Technically, that function is a bogus operation for a relational database, but no doubt there was a lot of demand for it so MySQL implemented it.

Storing data in a comma-separated list is an example of denormalization. Any departure from normalized design can give a performance boost for one type of query, but usually at the expense of all other types of queries against the same data.

For example, if you store players and their teams as a comma-separated list, it makes it very easy to get the list of teams for a given player, without doing a join. That's a performance improvement. But fetching the details for a given player's teams is much more difficult. Likewise searching for all players on a given team.

Use comma-separated lists only if that list is treated as a discrete "black box" piece of data. I.e. your application needs to fetch that list as a whole item, but never a subset of the list, and you never need to write SQL to use elements in that list for searching, joining, sorting, subtotals, etc.

See also my answer to Is storing a delimited list in a database column really that bad?

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
3

Table scan can not be considered as a benefit, at any time.

Moreover it's breaking the Normal form ( http://en.wikipedia.org/wiki/Database_normalization), as far as I remember from the school.

I think it's a good practice to have all the primary/foreign keys columns indexed to have performance benefit.

The only idea I would have in such a situation, is to politely ask architect on the particular project what was his idea behind the solution and explain him/her the performance disaster behind this :)

dm03514
  • 54,664
  • 18
  • 108
  • 145
Peter Butkovic
  • 11,143
  • 10
  • 57
  • 81