-1

in MySQL, is it good to have 500 Columns in one table? the rows will be increasing daily, the maximum row count would be less than or equal to 1million.

just to give a brief, these are my column headers

TableName: process_detail id, process_id, item_id, item_category, attribute1,attribute2,attribute3,...,attribute500,user_id1_update_time,user_id2_update_time,user_id1_comments,user_id2_comments

all attributes are varchar with length maximum 30. but less than 30.

and i have 25 item_categories.

should i create one single table like 'process_detail', or should i create multiple tables categorywise like category1_process_detail,category2_process_detail,..,category25_process_detail

each item_category has different number of columns, some will have 200 columns and some will have only 50 columns. lets say category1 will have 200 columns, category2 will have 50 columns.

some columns in category1 will not be in category2.

item_ids in process_id1 may come in process_id2. what is the best approach in terms of good performance ? please adivce.

Rick James
  • 135,179
  • 13
  • 127
  • 222
davidb
  • 263
  • 5
  • 10
  • 23
  • 1
    I recommend reading this: http://www.studytonight.com/dbms/database-normalization.php – Doomenik Aug 10 '17 at 10:24
  • You should implement EAV Entity Added Value like processDetails (id, process_id, item_id, item_category) attributes (id, title,) processToAttributes (id, process_id, attribute_id, attribute_value) So in this way, it will be optimized with respect to Normalization level 3 – Naveed Ramzan Aug 10 '17 at 10:24
  • @NaveedRamzan, thanks, but the attributes are editable field in the application.we can not have them predefined in a attribute table. – davidb Aug 10 '17 at 10:28
  • `user_id1_comments,user_id2_comments` are there really multiple comments in the same row of ONE column and even further there are mutliple comment columns? Can those columns be null? Those would be the first alerts for me that those belong somewhere else. And `attribute1,attribute2,attribute3` seems pretty obvious that should go into another table that can be joined from `process_detail`, what if 3 attribute-columns are not enough? Will you keep increasing column numbers? – DrCopyPaste Aug 10 '17 at 10:34
  • @davidb: I agree that these are editable and it can be managed along processID. Maybe you got a chance to see database of Magento. or you can study EAV concept. – Naveed Ramzan Aug 10 '17 at 10:37
  • @DrCopyPaste, `user_id1_comments,user_id2_comments` are the only two columns, it can be null. – davidb Aug 10 '17 at 12:50
  • 1
    EAV has its own set or problems. – Rick James Aug 17 '17 at 21:53
  • Splaying an array (categoryN_process_detail) across columns in _wrong_. – Rick James Aug 17 '17 at 21:54

2 Answers2

1

No it is not a good idea. Instead use a many to one relational mapping.

For example, create the process_detail table as you propose but without the attribute columns. Then create another table process_detail_attributes

CREATE TABLE `process_detail_attributes`  
(`pda_id` INT NOT NULL AUTO_INCREMENT,  
 `id` INT NOT NULL,   
 `attribute_key` INT NOT NULL,  
 `attribute_value` VARCHAR(30) NOT NULL,  
  PRIMARY KEY(`pda_id`),  
  FOREIGN KEY (id) REFERENCES process_detail (id)
) ENGINE...

Then for each attribute (attribute1...attribute500) needed just enter a row into the attribute table with the appropriate id inserted into the foreign key column.

The benefits of doing this are numerous. The link Doomenik mentions is probably a good starting point to understand why, but to put it tersely...

-If all attributes aren't used there will be no wasted storage space.
-Even if the attributes are used, the data will be stored within the actual index B-Tree node, exorbitantly inflating the amount of data per page and decreasing the amount of pages able to fit in the buffer pool (i.e RAM) and decreasing the locality of the keys. This will subsequently slow the index traversal.
-If these attributes are going to require indices (which attributes often do) then the unruliness of this table will be unconscionable.

There are of course times when you can consider de-normalization for the sake of performance but this does not seem like one of them.

You can then select the data from process_detail with all of its attributes like this:

SELECT a.process_id,  
a.user_id1_u‌​pdate_time,  
a.user_id2_u‌​pdate_time,  
a.user_id1_comments,  
a.user_id2_comments,  
b.*  
FROM process_detail a INNER JOIN process_detail_attributes b  
WHERE a.id = b.id AND whatever_condition_you_want_to_filter_by_here;
MarCPlusPlus
  • 366
  • 1
  • 5
  • thanks, `process_detail_attributes` table will have the attributes row wise, with foreign key (id) which is primary key of the table `process_detail`, when i want to select and display the attributes something like this `process_id, item_id, item_category, attribute1,attribute2,attribute3,...,attribute500,user_id1_update_time,user_id2_update_time,user_id1_comments,user_id2_comments`, how can i do this? and this is how it looks to user in front end application. – davidb Aug 10 '17 at 13:31
  • It might be appropriate to create a table similar to `process_detail_attributes` for the user_id_comments, etc... I'll edit my answer and put an example for the select your asking for and then can just mock that for user_id_comments or whatever other tables you decide to extract from your initial monolithic process_detail table. – MarCPlusPlus Aug 10 '17 at 15:25
  • ,if the query has to return `process_id, item_id, item_category, attribute1,attribute500,user_id1_u‌​pdate_time,user_id2_‌​update_time,user_id1‌​_comments,user_id2_c‌​omments` then process_detail_attributes table should be like this `CREATE TABLE `process_detail_attributes` (`pda_id` INT NOT NULL AUTO_INCREMENT, `id` INT NOT NULL, `attribute1` VARCHAR(30) NOT NULL, `attribute2` VARCHAR(30) NOT NULL, `attribute3` VARCHAR(30) NOT NULL, `attribute500` VARCHAR(30) NOT NULL, PRIMARY KEY(`pda_id`), FOREIGN KEY (id) REFERENCES process_detail (id) ) ENGINE..`? – davidb Aug 10 '17 at 16:40
  • Not not at all. 500 columns in a table is really not a good idea. If you must return 500 columns to the user then I would advise dealing with that on the application side. You should do a bit of research on RDBMS and normalization... It'll pay off in the long run! Best of luck to you, I hope you can make it work. – MarCPlusPlus Aug 10 '17 at 17:44
0

InnoDB won't support 500 varchar columns, because of the way rows are stored. Even if you use InnoDB's ROW_FORMAT=DYNAMIC, this would store 500x20 bytes per row for the varchars, which would be greater than the 8KB row size limit. See https://www.percona.com/blog/2010/02/09/blob-storage-in-innodb/ for more details on InnoDB row storage.

Having such a large number of columns is a red flag for problematic database design anyway.

  • If you store numerous columns for similar attributes, you're violating the principle of eliminating Repeating Groups of columns, which is part of making a table satisfy First Normal Form.

  • If the columns are not similar attributes, then you're simply not designing a relation. In a relation, you must define the heading with meaningful column names and data types. When you name your columns generically like attribute1, etc., you're not designing the table in a relational way.

I disagree with suggestions to use an EAV table design. I have posted frequently here on Stack Overflow or on my blog EAV FAIL about the fact that EAV is a broken design for a relational database.

See my answer to https://stackoverflow.com/a/695860/20860 or my presentation Extensible Data Modeling for some alternative solutions to your task of storing different attributes for different process types.

You might like to read about using the JSON data type in MySQL 5.7 to store semi-structured collections of attributes specific to each of your different process types.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828