8

In Meetup.com, when you join a meetup group, you are usually required to complete a profile for that particular group. For example, if you join a movie meetup group, you may need to list the genres of movies you enjoy, etc.

I'm building a similar application, wherein users can join various groups and complete different profile details for each group. Assume the 2 possibilities:

  1. Users can create their own groups and define what details to ask users that join that group (so, something a bit dynamic -- perhaps suggesting that at least an EAV design is required)
  2. The developer decides now which groups to create and specify what details to ask users who join that group (meaning that the profile details will be predefined and "hard coded" into the system)

What's the best way to model such data?

More elaborate example:

The "Movie Goers" group request their members to specify the following:

  • Name
  • Birthdate (to be used to compute member's age)
  • Gender (must select from "male" or "female")
  • Favorite Genres (must select 1 or more from a list of specified genres)

The "Extreme Sports" group request their member to specify the following:

  • Name
  • Description of Activities Enjoyed (narrative form)
  • Postal Code

The bottom line is that each group may require different details from members joining their group. Ideally, I would like anyone to create a group (ala MeetUp.com). However, I also need the ability to query for members fairly well (e.g. find all women movie goers between the ages of 25 and 30).

StackOverflowNewbie
  • 39,403
  • 111
  • 277
  • 441

7 Answers7

8

For something like this....you'd want maximum normalization, so you wouldn't have duplicate data anywhere. Because your user-defined tables could possibly contain the same type of record, I think that you might have to go above 3NF for this.

My suggestion would be this - explode your tables so that you have something close to 6NF with EAV, so that each question that users must answer will have its own table. Then, your user-created tables will all reference one of your question tables. This avoids the duplication of data issue. (For instance, you don't want an entry in the "MovieGoers" group with the name "John Brown" and one in the "Extreme Sports" group with the name "Johnny B." for the same user; you also don't want his "what is your favorite color" answer to be "Blue" in one group and "Red" in another. Any data that can span across groups, like common questions, would be normalized in this form.)

The main drawback to this is that you'd end up with a lot of tables, and you'd probably want to create views for your statistical queries. However, in terms of pure data integrity, this would work well.

Note that you could probably get away with only factoring out the common fields, if you really wanted to. Examples of common fields would include Name, Location, Gender, and others; you could also do the same for common questions, like "what is your favorite color" or "do you have pets" or something to that extent. Group-specific questions that don't span across groups could be stored in a separate table for that group, un-exploded. I wouldn't advise this because it wouldn't be as flexible as the pure 6NF option and you run the risk of duplication (how do you predetermine which questions won't be common questions?) but if you really wanted to, you could do this.

There's a good question about 6NF here: Would like to Understand 6NF with an Example

I hope that made some sense and I hope it helps. If you have any questions, leave a comment.

Community
  • 1
  • 1
  • If anyone is reading this question today - consider going with @ user1046334's suggestion. Relational databases really aren't the best way to do this. –  Jul 12 '17 at 06:11
5

Really, this is exactly a problem for which SQL is not a right solution. Forget normalization. This is exactly the job for NoSQL document stores. Every user as a document, having some essential fields like id, name, pwd etc. And every group adds possibility to add some fields. Unique fields can have names group-id-prefixed, shared fields (that grasp some more general concept) can have that field name free.

Except users (and groups) then you will have field descriptions with name, type, possible values, ... which is also very good for a document store.

If you use key-value document store from the beginning, you gain this freeform possibility of structuring your data plus querying them (though not by SQL, but by the means this or that NoSQL database provides).

  • I'd upvote you a ton of times if possible, doing this in a realtional database is a prescription for performance problems. – HLGEM Nov 17 '11 at 21:31
  • 1
    BTW, here is very nice comparision of them http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis very nice because it shows their strength and weaknesses and tells for which kind of problem they are good –  Nov 17 '11 at 21:37
  • thumbs up - if you use a document-based database, make sure to understand Map and Reduce-functionality to be able to perform SQL-like queries to handle the data for statistics and relational data (e.g. the posts of a certain user or friends) – Lars Nov 18 '11 at 10:30
  • Hate to admit it, but this is a damned good answer. As such, you're getting my upvote, and I'm probably losing the bounty. However, as long as it's possible to maintain data across different groups, this gets my vote. Well done! My commendations. :D (One reason I supported this is because (if I remember correctly) Facebook implemented something like this in their earlier years. Take that with a grain of salt, though. –  Nov 19 '11 at 06:51
  • @lunchmeat317: Thanks. As for facebook's data API, it wasn't a success and they already deprecated it, not a long ago. It was non-SQL model, too, but it was not a document store, but more a graph store (raw data and their relations). Amazon AWS also has a SimpleDB, but I don't know what model it is. –  Nov 19 '11 at 10:57
1

First i'd like to note that the following structure is just a basis to your DB and you will need to expand/reduce it.

There are the following entities in DB:

  • user (just user)
  • group (any group)
  • template (list of requirement united into template to simplify assignment)
  • requirement (single requirement. For example: date of birth, gender, favorite sport)

"Modeling":

**User**
user_id
user_name

**Group**
name
group_id

user_group
user_id (FK)
group_id (FK)

**requirement**:
requirement_id
requirement_name
requirement_type (FK) (means the type: combo, free string, date) - should refers to dictionary)

**template**
template_id
template_name

**template_requirement**
r_id (FK)
t_id (FK)

The next step is to model appropriate schema for storing restrictions, i.e. validating rule for any requirement in any template. We have to separate it because for different groups the same restrictions can be different (for example: "age"). You can use the following table:

**restrictions**
group_id
template_id
requirement_id (should be here as template_id because the same requirement can exists in different templates and any group can consists of many templates)
restriction_type (FK) (points to another dict: value, length, regexp, at_least_one_value_choosed and so on)

So, as i said it is the basis. You can feel free to simplify this schema (wipe out tables, multiple templates for group). Or you can make it more general adding opportunity to create and publish temaplate, requirements and so on.

Hope you find this idea useful

ravnur
  • 2,772
  • 19
  • 28
0

You could save such data as JSON or XML (Structure, Data)

User Table

  • Userid
  • Username
  • Password
  • Groups -> JSON Array of all Groups

GroupStructure Table

  • Groupid
  • Groupname
  • Groupstructure -> JSON Structure (with specified Fields)

GroupData Table

  • Userid
  • Groupid
  • Groupdata -> JSON Data
RaphaelH
  • 2,144
  • 2
  • 30
  • 43
  • Raphael, sounds good, but this means I wouldn't be able to query things in the XML/JSON. For example, say one of the pieces of data I want is "age". How could I search for all users that are within a certain age bracket? The data needs to "query-able". – StackOverflowNewbie Nov 03 '11 at 22:07
  • Yeah, that's right.. Another way could be, creating dynamically tables for each Group, but that could eat up your resources.. – RaphaelH Nov 04 '11 at 08:03
  • you do not need new tables for each group. you would need 1 UserGroup Table `UserId,GroupId`. Each group that a member belongs to would have 1 row. i.e if user 1 belongs to groups 1, 2 and 3 Then there would be 3 rows. `1,1`;`1,2`;`1,3` etc. Using a JSON text field like that in a database is in violation of 1NF. You should do something similar to this for the other tables as well. – Ben English Nov 14 '11 at 15:50
  • @stackoverflownewbie, you can query XML using ExtractValue (http://dev.mysql.com/doc/refman/5.1/en/xml-functions.html). But how do you plan to dynamically write queries for the group-specific fields? If you are not going to write them dynamically, you can you can adjust the table structures to simplify your queries. And, by the way, dynamic tables might be a good option in your case. – newtover Nov 15 '11 at 12:29
0

I think this covers most of your constraints:

users
user_id, user_name, password, birth_date, gender

1, Robert Jones, *****, 2011-11-11, M

group
group_id, group_name

1, Movie Goers
2, Extreme Sports

group_membership
user_id, group_id
1, 1
1, 2

group_data
group_data_id, group_id, group_data_name

1, 1, Favorite Genres
2, 2, Favorite Activities

group_data_value
id, group_data_id, group_data_value
1,1,Comedy
2,1,Sci-Fi
3,1,Documentaries
4,2,Extreme Cage Fighting
5,2,Naked Extreme Bike Riding

user_group_data
user_id, group_id, group_data_id, group_data_value_id

1,1,1,1
1,1,1,2
1,2,2,4
1,2,2,5
Ben English
  • 3,900
  • 2
  • 22
  • 32
  • So a table for the common data like `birth_date` and `gender`, then EAV (in your example, `group_data` and `group_data_value`) for the rest of the data? Problems: I would need to put `birth_date` and `gender` in a table other than `users` since its really part of "group data". Also, not all groups may require, say, gender. For example, what if the profile is for a business? I think this means I will need yet another table to flag is group members need or don't need certain common data. Right? – StackOverflowNewbie Nov 15 '11 at 04:52
  • As for the EAV portion, this may work for selecting from a list of things (e.g. from a checkbox, radio, etc.). What if one of the groups required some narrative to be filled out -- and only that particular group needed that, so it can't be put in the table with the "common" data? The EAV will no longer be enough. – StackOverflowNewbie Nov 15 '11 at 04:54
  • @StackOverflowNewbie your first comment about wanting to move the birthdate out of the user field really makes me think you are not grasping the relational database concept. Birth date and gender are related to the user, it may also be group data but it should not be stored there. If for instance you wanted to query all men in group X then you would simply join the user table with the group_membership table. This is a fundamental principal of relational database design. – Ben English Nov 15 '11 at 15:16
  • @StackOverflowNewbie as for your 2nd question about putting a 'narrative' into the common data table, it would definitely belong in this table. The narrative would be inserted into this table when the user created their narrative. It does not have to be a 'read only' table. I'm not sure why you think it cannot be put in the group_data_value table. – Ben English Nov 15 '11 at 15:19
  • Ben, my users can have one or more profiles. Birthdate is associated with the profile, not the users. – StackOverflowNewbie Nov 15 '11 at 22:49
  • @StackOverflowNewbie can your users have more than 1 birthdate? Again, I really don't think you're grasping the power of relational databases. Each `profile` should get the birthdate field from the users table. A birthdate is related to a user not a profile. Each profile would be related to a user (many profiles to 1 user). Putting birthdate in each profile would be redundant and violate Normal Form. – Ben English Nov 16 '11 at 15:49
  • Ben, I'm saying that I do have a need to put birthdate in the profile and not associate it with the user. For example, a user has a profile in group 1 wherein his age is 30. The same user has a profile in group 2 wherein his age is 35. – StackOverflowNewbie Nov 17 '11 at 00:38
  • then throw the birth date value on the group_data_value table and an entry for birth date on the group_data field for whichever group you want to require a birth date. Also, care to explain why gender and birth date would not be unique to a user? that seems ridiculous. – Ben English Nov 17 '11 at 15:57
  • I agree with @Ben. Specific information like birthdate and gender should be associated with the user directly, not per-user-profile. I think many would agree that it would be an extraordinarily bad idea to encourage users to lie about these peices of information by building it into your design. There could be real legal ramifications. – Random Nov 17 '11 at 20:09
0

I've had similar issues to this. I'm not sure if this would be the best recommendation for your specific situation but consider this.

  1. Provide a means of storing data as XML, or JSON, or some other format that delimits the data, but basically stores it in field that has no specific format.

  2. Provide a way to store the definition of that data

  3. Provide a lookup/index table for the data.

This is a combination of techniques indicated already.

Essentially, you would create some interface to your clients to create a "form" for what they want saved. This form would indicated what pieces of information they want from the user. It would also indicate what pieces of information you want to search on.

Save this information to the definition table.

The definition table is then used to describe the user interface for entering data.

Once user data is entered, save the data (as xml or whatever) to one table with a unique id. At the same time, another table will be populated as an index with

id where the xml data was saved name of field data is stored in value of field data stored. id of data definition.

now when a search commences, there should be no issue in searching for the information in the index table by name, value and definition id and getting back the id of the xml/json (or whatever) data you stored in the table that the data form was stored.

That data should be transformable once it is retrieved.

I was seriously sketchy on the details here, I hope this is enough of an answer to get you started. If you would like any explanation or additional details, let me know and I'll be happy to help.

CStroliaDavis
  • 392
  • 4
  • 14
0

if you're not stuck to mysql, i suggest you to use postgresql which provides build-in array datatypes.

you can define a define an array of varchar field to store group specific fields, in your groups table. to store values you can do the same in the membership table.

comparing to string parsing based xml types, this array approach will be really fast.

if you dont like array approach you can check out xml datatypes and an optional hstore datatype which is a key-value store.

ncank
  • 946
  • 5
  • 15