4

I am currently integrating an open source chat (AJAX Chat) into another project. Now, the chat by default just gets its valid users and valid channels from a file, but obviously this is not ideal when you have a database with constantly changing users.

So, I want to make it so that the chat loads user and channel info straight from the database. I was thinking that the design should be as follows(If you feel differently, please do let me know):

  • We have some chat channels (public, marketing, etc...)
  • Then we have groups assigned to channels (like PR team 1, IT people, etc...)
  • Then we have users who are part of groups AND in some cases are assigned directly to the channels.

I was thinking of implementing the above with tables like these:

Channels Table:

|----|Channel_Name||Channel_ID||Groups_Assigned||Users_Assigned|----|  
|----|---Public---||-----0----||---1,2,3,4,5---||-----3,4------|----|  
.  
.  
.etc...

Note: The groups assigned table contains the group id's of the groups assigned to the channel, while the users assigned contains the id of the users who are not part of the groups that are assigned.

The Groups Table:

|----|Group_Name||Group_ID||Users_Assigned|----|  
|----|---Team1--||----0---||------5,10----|----|  
.  
.  
.etc...  

Sorry for the badly drawn tables.

Now, with the above implementation, when a user logs in, the program is going to get the user id (from the users table), then search the groups table for all groups that contain the user id, and finally, search the channel table for all channels that contain either groups (that a user is part of) or channels that have the user directly assigned to them.

My idea is possible, but it seems a bit, uhh, inefficient. Since I would have to be storing the assigned ids (both group and user) in the format 1,2,3...., I would have to use either PHP's explode() or some other PostgreSQL function that can search strings. I would most likely be storing an array of groups, and then cycling through them all, one row at a time, this seems really slow to me.

Or, I could have a boolean column for each user, but that would result in way too many columns, and I don't want to create a new column every time a user is created.

So, how would you guys do this? And, if for some mad reason, you happen to agree with my initial idea, then could you help me figure out how to actually write the code to actually do it.

Thank you for your time, have a good day.

Farray
  • 8,290
  • 3
  • 33
  • 37
zermy
  • 611
  • 1
  • 11
  • 25
  • 2
    Your intuition is correct, it's not a good design. Read this **is-storing-a-comma-separated-list-in-a-database-column-really-that-bad**: http://stackoverflow.com/questions/3653462/is-storing-a-comma-separated-list-in-a-database-column-really-that-bad – ypercubeᵀᴹ Nov 24 '11 at 19:47

3 Answers3

6

Yes, it's inefficient to store comma-separated strings of numbers and to try to search your database for a given number. For more on this, see my answer to Is storing a comma separated list in a database column really that bad?

Instead, you should use intersection tables to store the many-to-many relationship between users and groups, and between groups and channels. Then your searches will benefit from indexes, and you can use a join to get back to the group or channel table.

Community
  • 1
  • 1
Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
2

I'd go for one more table instead of the 1,2,3,4,5 values as they are difficult to read. Remove Groups_Assigned from the channels table and put it into a separate table in a 1 to many format:

Channel_id  Group_id
----------  --------
0           1
0           2
0           3
0           4
0           5

I'd create another table as groups that would join to this table here and holds information as to what each group_id is. Then it's a matter of writing queries capable of reading this design as you need.

Andriy M
  • 76,112
  • 17
  • 94
  • 154
Twelfth
  • 7,070
  • 3
  • 26
  • 34
  • I used the groups_assigned as an example, same logic would be best in the user_assigned as well. It is possible to have both the channel and group 'user_assigned' point at a single 'user_assigned' table – Twelfth Nov 24 '11 at 19:53
  • Thanks, I will probably end up doing something like this. I had thought of this solution, but for some reason, my stupid brain dismissed it. – zermy Nov 24 '11 at 20:51
  • Heh, not stupid...once upon a time I would have done the same. I believe I learned otherwise from a forum like this, heh the circle continues. If you're coming to db design from something linear...remember arrays suck in SQL. One to many relationships are so much easier to read from in this form, both from an ease of code and a performance standpoint – Twelfth Nov 24 '11 at 21:54
1

One possible solution:

Channel
------------
Channel_Id
Channel_Name
PRIMARY KEY (Channel_Id)

The Person and Grouping (I prefer those over User and Group as some systems use those as keywords) can be thought as subtypes of a supertype Entity. This will help later to have only one Assignment table.

Entity
------------
Entity_Id
PRIMARY KEY (Entity_Id)

Person  --- ( User )
------------
Person_Id
Person_Name
--- other data about persons/users
PRIMARY KEY (Person_Id)
FOREIGN KEY (Person_Id)
  REFERENCES Entity(Entity_Id)

Grouping   --- ( Group )
------------
Grouping_Id
Grouping_Name
--- other data about groups
PRIMARY KEY (Grouping_Id)
FOREIGN KEY (Grouping_Id)
  REFERENCES Entity(Entity_Id)

This will be used for Person - Grouping association:

Belongs --- ( Person Belongs In Grouping )
------------
Person_Id
Grouping_Id
PRIMARY KEY (Person_Id, Grouping_Id)
FOREIGN KEY (Person_Id)
  REFERENCES Person(Person_Id)
FOREIGN KEY (Grouping_Id)
  REFERENCES Grouping(Grouping_Id)

And the association table for the assignments to channels

Assignment ( Entity is Assigned to Channel )
------------
Entity_Id
Channel_Id
PRIMARY KEY (Entity_Id, Channel_Id)
FOREIGN KEY (Entity_Id)
  REFERENCES Entity(Entity_Id)
FOREIGN KEY (Channel_Id)
  REFERENCES Channel(Channel_Id)

You could off course get rid of the Entity table and have two association tables, one for Person to Channel and one for Group to Channel assignments.

ypercubeᵀᴹ
  • 113,259
  • 19
  • 174
  • 235
  • Oh, this is a nice solution, but I already have a populated users table, so I feel that I must ask: "How safe is it for me to add a foreign key when I already have a users table and more than 300 other tables ( I know that I would have to the the entities first, and then reference them). Other than though, I agree with yours and everyone else's answer that I should use intersection tables. – zermy Nov 24 '11 at 20:50
  • If you have many tables that reference your users table, no need to change those references. If you already have the Users and Groups tables and you want to add an Entity table, that will be tricky (in one of the two tables, the Primary Key values would have to be changed, so either the `Users.User_Id` or the `Groups.Group_Id` and all the referencing to those values). – ypercubeᵀᴹ Nov 24 '11 at 20:57
  • Plus, you `INSERT` procedures into the Users and Groups would have to be adjusted (first insert into Entity, then use the new PK and insert into Users or Groups). – ypercubeᵀᴹ Nov 24 '11 at 20:58
  • Oh right, forgot about the `INSERT's`, changing that would take way too much work, but thanks for the solution though, I shall remember it if I make something from scratch. – zermy Nov 24 '11 at 21:03
  • In that case, you should use 2 intersection tables, one for `User-Channel` association and one for `Group-Channel`. – ypercubeᵀᴹ Nov 24 '11 at 21:07
  • @ypercube: If I've understood you correctly, you don't need to change existing primary key column names or primary key values in the tables Users or Groups. You *do* need [an additional column](http://stackoverflow.com/q/4970646/562459) in Users, Groups, and Entities to guarantee that a given entity id number cannot appear in both subtype tables, but you need to do that anyway. Having done that, I think you can rename the base tables, and build updatable views to hide all these changes from the application code, but I'm not 100% sure without testing. – Mike Sherrill 'Cat Recall' Nov 25 '11 at 11:38
  • @Catcall: Yes, you are right, you can achieve same referential integrity by adding a Unique Key and not tampering with the existing Primary Key. I hadn't thought of that. It's not the best one but when one has already data, it's I guess the only solution that requires minimum altering of the table and no change at all in the existing (PK) data. – ypercubeᵀᴹ Nov 25 '11 at 13:05