0

Rather than explain the whole situation, which would just add confusion to the matter, I've just decided to go with a very light table example and ask my question below it

I have a database table which looks kind of like this:

    product_id | category_id | category_name
         1            1           Cat One
         2            2           Cat One
         3            3           Cat One
         4            4           Cat Two
         4            5           Cat Two

How can I set the second and third row to have the same category_id as the first? So it looks like:

    product_id | category_id | category_name
        1             1            Cat One
        2             1            Cat One
        3             1            Cat One
        4             4            Cat Two
        5             4            Cat Two

Bearing in mind I need to do this for 12,474 rows! :(

I've been bashing my head against the desk for hours with this. Any ideas would be massively appreciated

Thanks, Andy

P.S: I forgot to add that the category_id's should be the same because the category_name is Cat One for all of them

Edit 2: edited table sample a bit to avoid confusion

Andy
  • 4,901
  • 5
  • 35
  • 57
  • 1
    What's the condition to change `category_id` ? Do you want to change all of them to the same value no matter what? – dee-see Aug 31 '11 at 16:49
  • I want them to be the same value if category_name is the same, sorry, should've mentioned that – Andy Aug 31 '11 at 16:50
  • And how would you decide which `category_id` to use ? Is it only based on the name or it's the lowest id of the common `category_names` or .. .? Does it even matter as long as the same category shares the same id? – dee-see Aug 31 '11 at 16:56
  • No it doesn't matter as long as they share the same id. The lowest id of common category_names I guess would be the best way – Andy Aug 31 '11 at 16:59
  • @Andy -- see my update (test it on a small subset) – Naftali Aug 31 '11 at 17:02

5 Answers5

1

Create a table that maps each category name to an id

CREATE TABLE `categories` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `name` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`id`)
) SELECT NULL as id, `category_name` as name FROM `tableName` GROUP BY `category_name`

Then UPDATE everything

UPDATE `tableName`, `categories` 
  SET `tableName`.`category_id` = `categories`.`id` 
  WHERE `tableName`.`category_name` = `categories`.`name`

And then drop the categories table!

dee-see
  • 23,668
  • 5
  • 58
  • 91
  • I think that's the right idea, but I keep getting a #1062 Duplicate entry error :( – Andy Aug 31 '11 at 17:19
  • I tried on a small sample and I didn't have any problems. Is that all the details the error message gives you? – dee-see Aug 31 '11 at 17:22
  • the full error message: #1062 - Duplicate entry '152-1297' for key 1 – Andy Aug 31 '11 at 17:26
  • Hmm you shouldn't get '152-1297' as a value for `id`. Did you modify the `SELECT` in the `CREATE` statement? Maybe there is something being inserted in `id` that shouldn't. EDIT: By the way, I edited my `CREATE` query a while ago, I don't know when you read it but it might be why, the old one would generate such an error. – dee-see Aug 31 '11 at 17:32
  • all I did was add AUTO_INCREMENT=111 before the select to preserve pre-existing categories that aren't affected by this – Andy Aug 31 '11 at 17:38
  • And the select is `SELECT NULL as id, `category_name` ...` and not the `SELECT DISTINCT category_name` I had up earlier? – dee-see Aug 31 '11 at 17:44
  • @Vache let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/3037/discussion-between-andy-and-vache) – Andy Aug 31 '11 at 18:15
0
UPDATE `table_name` SET `category_id`='1' WHERE `category_name`="Cat One";

Basic UPDATE with a WHERE clause.


Try this with subqueries (it has never been tested):

UPDATE `table_name` u SET 
      `category_id`= (SELECT `category_id` 
                      FROM `table_name` t 
                      WHERE u.category_name = t.category_name 
                      GROUP BY category_name);
Naftali
  • 144,921
  • 39
  • 244
  • 303
  • I'm not sure I want to do that for how ever many thousand unique categories there are – Andy Aug 31 '11 at 16:56
  • @Andy, get some server side code and get the relations from `category_name` to `category_id` and do the updates from there ... – Naftali Aug 31 '11 at 16:57
  • Haven't tried that but I know it won't work. You can't use the u from the main query in the subquery. Thanks anyway – Andy Aug 31 '11 at 17:05
  • @Andy -- whoever told you that was lying. of course you an use it. thats the whole idea of subqueries... http://www.techonthenet.com/sql/update.php – Naftali Aug 31 '11 at 17:06
  • Nobody told me, learnt from my own experience. It will always come back with error #1093 – Andy Aug 31 '11 at 17:10
0
UPDATE table_name SET category_id=1
WHERE category_name="Cat One" AND category_id<>1;

if the category_name is the decide factor

UPDATE table_name, category 
SET table_name.category_id=category.category_id
WHERE table_name.category_name=category.category_name
AND table_name.category_id<>category.category_id;

and your question is getting more fuzzy ...

UPDATE table_name, 
(select min(category_id) as cid, category_name from category 
group by category_name) as alias_table 
SET table_name.category_id=cid
WHERE table_name.category_name=alias_table.category_name;
ajreal
  • 46,720
  • 11
  • 89
  • 119
  • its basically the same as one of ur previous answers: http://stackoverflow.com/questions/4471277/mysql-delete-from-with-subquery-as-condition/4471350#4471350 – Naftali Aug 31 '11 at 17:13
  • I think the question you mentioned only apply for delete, if I remember correctly, it should fix in the later version (5.5) – ajreal Aug 31 '11 at 17:18
0

I think you'll need to use a little PHP or other scripting languange.

Try something along these lines:

$result_set = query("
    SELECT category_id, categeory_name
    FROM table_name
    WHERE 1
    GROUP BY category_name
    ORDER BY category_id ASC
"); //should give you all unique category_names and the lowest category_id for that name

while($row = mysql_fetch_assoc($result_set)) {
    query("UPDATE table_name SET category_id = {$row['category_id']} WHERE category_name = '{$row['category_name']}'");
}
ben
  • 1,946
  • 2
  • 18
  • 26
0

You should ask yourself first if the design you have is truly the one you want.

Specifically I am talking about normalizing your relations. Your tables do not appear to be in second normal form. You might want to consider having a category table, then having a foreign key from the product table to the category table.

Looks like this

product_id | category_id 
     1            1         
     2            1          
     3            1         
     4            2         
     5            2        

category_id | category_name
     1            cat one
     2            cat two

To get to second normal form you would need to migrate the data.

Create the category table

create table categories (category_id int(11), category_name varchar(100));

Then select all of the redundant data into the table.

insert into categories
select 
 category_id,
 category_name
from products

Delete redundant data

delete c2 from categories c1, categories c2
where c1.category_name = c2.category_name and c1.category_id < c2.category_id

Now point the products table at the normalized data

update products, categories
set product.category_id = categories.category_id
where product.category_name = categories.category_name

Then drop the category_name column from products

alter tables products drop column category_name

Finally add a foreign key constraint from products to categories.

The end result is a normalized schema removing the possibility of redundant data.

Edit: I Really mean Second Normal Form, not first.

Another Edit: The Delete Redundant Data step was initially wrong, and would have deleted all categories =)

Randy
  • 66
  • 5
  • Hi Randy, thanks for your detailed reply. I already have a category table, but I just added the name field to my product to category table to simplify the process and try to make it easier to explain on here. I just tried the update method you posted but got this error: #1062 - Duplicate entry '152-80' for key 1 - Any ideas what's going on? – Andy Aug 31 '11 at 18:56
  • Yes. Look at your primary key on the products table. Is it (product_id, category_id)? If so, this means you have redundant data. You will have to create a temporary products table, 'select distinct *' into it, delete from the original, then insert from the temporary back into the original. The end result will be less rows but the same amount of information. I recommend working on backup data. – Randy Aug 31 '11 at 19:52