0

Let's say I have this table, with both columns as primary keys, in mysql workbench and Innodb engine:

+--------+---------+
| grp    | name    |
+--------+---------+
| fish   | lax     |
| mammal | dog     |
| mammal | bat     |
| mammal | whale   |
| bird   | bat     |
| bird   | ostrich |
+--------+---------+

How can I add a column, behaving like this with grp column:

+--------+----+---------+
| grp    | id | name    |
+--------+----+---------+
| fish   |  1 | lax     |
| mammal |  2 | dog     |
| mammal |  2 | cat     |
| mammal |  2 | whale   |
| bird   |  3 | penguin |
| bird   |  3 | ostrich |
+--------+----+---------+

Note that the actual table I want to alter is much bigger. Also note this is not what auto incrementing with MyISAM does and not what is asked here(but the answers could be helpful): How to auto increment on different foreign keys?

Community
  • 1
  • 1
Nick Zafiridis
  • 146
  • 1
  • 13
  • perhaps look into triggers, do a query wether it is in there, get the `MAX` from the `id` column and increment by one – Rene Pot Jul 26 '15 at 22:06
  • Well, the correct way (in the widely accepted notion of database normalisation) is not to have such a column in that table - the new ID belongs in a `grp` table, from which it can be referenced as a foreign key (instead of having the group name repeated). – IMSoP Jul 26 '15 at 22:08
  • @IMSop ,that is exactly what i want to do in an existing database. This would be the first step. Then I would insert (grp,id) pairs into a new table(or update an existing empty one) which is the grp table you mention. After that I would just delete the grp column from the first table. The alternative would be creating grp table first with auto increment id and the grp values of this table. Say the first table is animals and I add the empty id column. Then I would have to do: UPDATE Animals as A, Grp as G SET A.id=G.id WHERE A.grp=G.grp; The complexity of this way is O(sizeA * sizeG), too big – Nick Zafiridis Jul 27 '15 at 08:24
  • @IMSop My table sizes are 200k and 100k. I left that update overnight and it finished, but there has to be a faster solution. So finding a way I could create the ids in table A and then just insert pairs should have a complexity of O(sizeA + sizeG) which is fine. – Nick Zafiridis Jul 27 '15 at 08:30
  • I'm not sure how you're calculating those complexity values. Either way, you have to update each row in the table once, and you have to look up which group the row is part of to do so. The only bit that can vary is the manner of that look up. If there are a small number of groups, you could even do multiple updates like `SET grpid=4 WHERE grp='fish'"` – IMSoP Jul 27 '15 at 10:27
  • First way involves scanning each row of A for each row of G and has nothing to do with looking up groups. Second way does and it involves scanning table A a probably constant number of times(like 2). Then it creates the new table. That is how the complexities are calculated. I could be wrong on the second one, which is a guess. Even if there is no way for second complexity to be so low, it should still be lower than the first one. All in all, that update feels to me so unappealing compared to somehow generating that grp column I just had to ask. – Nick Zafiridis Jul 27 '15 at 13:22
  • As I said my G table is about 100k and A about 200k. So the least number of groups is 100.001, if just 1 value existed 100k times and the rest one. Even this is not the case though, there are more groups. So multiple updates is out of the question :/ I really hope the solution ,whatever it is, doesn't involve scanning the whole table for each group. This would actually make it a bit worse than the update I did overnight. Unless indexing could make it better, but i doubt it, cause the whole table would be scanned anyway. – Nick Zafiridis Jul 27 '15 at 13:37

1 Answers1

0
CREATE TABLE Groups (
    id INT UNSIGNED NOT NULL AUTO_INCREMENT,
    grp VARCHAR...
    PRIMARY KEY(id),
    UNIQUE(grp)
        ) ENGINE=InnoDB
    SELECT DISTINCT grp FROM YourTable;

That should give you the mapping between grp and id.

At this point, you should not generate a table with 3 columns (grp+id+name), but instead use the above Groups as a "normalization", which you seem to be doing. Then do

CREATE TABLE New LIKE YourTable;
ALTER TABLE New
    ADD  COLUMN id INT UNSIGNED NOT NULL,
    DROP COLUMN grp;
# Note: if you have an index on grp, something else may need to be done.
INSERT INTO New
    SELECT g.id, y.name
        FROM Groups g
        JOIN YourTable y  ON y.grp = g.grp;
# At this point, verify that `New` has what you want.  Then switch over:
RENAME TABLE YourTable TO Old, New TO YourTable;
DROP TABLE Old;

Edit I chose to create a new table rather than update the existing table -- because it will be (I think) considerably faster. With UPDATE, all the rows would need to be locked. With INSERT..SELECT, they are simply created.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • Thanks! This does not answer my question, however. I already did this, with update. The condition of join/update is too complex. It scans y for each row of g, causing O(sizey*sizeg) complexity. That's why I ask how to generate that table. Should there be a less complex way, it is one simple operation away from the same result. The update I did took all night. – Nick Zafiridis Jul 28 '15 at 18:40
  • grp has a primary index, because it is part of the primary key. Still it got dropped fine for me. I did it after my update finished. Is there a problem dropping columns if they have indexes? Also, does using a temporary table improve performance, or is it just for safety? I had an error using update without aliases. Something about locks. Maybe using aliases created temporary tables internally, which resolved the issue. Is there a performance gain on using join in place of update? I know using join over update is recommended, why is that? I know join is what really happens anyway, right? – Nick Zafiridis Jul 28 '15 at 18:44
  • (I edited my reply.) A `JOIN` happens only when you say `JOIN` (or use a "comma-JOIN": `FROM a, b`). – Rick James Jul 28 '15 at 19:44
  • Doesn't update from use join internally? In fact, I had seen somewhere it is a deprecated syntax of join, which shouldn't be used instead of it. Is each row locked and unlocked separately with update? It doesn't just lock all of them once, to ensure their availability, right? Is there a problem dropping columns if they have indexes? – Nick Zafiridis Aug 07 '15 at 18:10
  • If there aren't 2 tables, there can't be a JOIN. InnoDB will lock the rows it needs, and keep the locks until the statement is finished. (Effectively "all at once".) I don't know what happens if you drop a column without first (or in the same ALTER) dropping any indexes containing it. – Rick James Aug 07 '15 at 19:14
  • In update from there are multiple tables. I understand it actually is a join in disguise. If it is indeed all at once, it shouldn't take too much time. If internally a procedure is followed n times for n rows, it will. In any case, nice tip, thx :) Why would anything happen if you drop a column with an index? In the future I will be more cautious with this too anyway. – Nick Zafiridis Aug 09 '15 at 23:18