3

Storing data in comma separated strings wasn't up to me and it isn't something I can change in my database so bear with me. I did quite a bit of searching already online and on stackoverflow but I couldn't find a solution to this, if it's even possible using MySQL.

I am trying to replace all instances of every unique string from table1 with a matching value from table2. I have tried wildcards, replace, update, join, etc and I'm just not sure how to make it work. I know one solution would be replace() for each string but table2 has over 200 rows so that would mean nesting over 200 times.

Here's what I'm trying to accomplish. I have two tables, table1:

+------+-------------+
| Item | Code        |
+------+-------------+
| 1    | 614         |
+------+-------------+
| 2    | 212,614,415 |
+------+-------------+
| 3    | 212,303     |
+------+-------------+
| ...  | ...         |
+------+-------------+

and table2:

+------+-------------------+
| Code | Name              |
+------+-------------------+
| 614  | Columbus, OH      |
+------+-------------------+
| 212  | New York, NY      |
+------+-------------------+
| 415  | San Francisco, CA |
+------+-------------------+
| 303  | Ft. Worth, TX     |
+------+-------------------+
| ...  | ...               |
+------+-------------------+

I want to replace codes from table1 with the corresponding values from table2 to produce this result:

+------+---------------------------------------------+
| Item | Code                                        |
+------+---------------------------------------------+
| 1    | Columbus, OH                                |
+------+---------------------------------------------+
| 2    | New York, NY,Columbus, OH,San Francisco, CA |
+------+---------------------------------------------+
| 3    | New York, NY,Ft. Worth, TX                  |
+------+---------------------------------------------+
| ...  | ...                                         |
+------+---------------------------------------------+

2 Answers2

1

This should do it (see the last query below). I've included the commas in the join so that and id of something like 12 does not match where you have and id of 212 (for example).

drop table if exists table1;

drop table if exists table2;

create table table1(
    item int,
    code varchar(64)
);

create table table2(
    code int,
    name varchar(64)
);

insert into table1 values (1, '614');
insert into table1 values (2, '212,614,415');
insert into table1 values (3, '212,303');

insert into table2 values(212, 'New York, NY');
insert into table2 values(303, 'Ft. Worth, TX');
insert into table2 values(415, 'San Francisco, CA');
insert into table2 values(614, 'Columbus, OH');

select * from table1

+ --------- + --------- +
| item      | code      |
+ --------- + --------- +
| 1         | 614       |
| 2         | 212,614,415 |
| 3         | 212,303   |
+ --------- + --------- +
3 rows

select * from table2

+ --------- + --------- +
| code      | name      |
+ --------- + --------- +
| 212       | New York, NY |
| 303       | Ft. Worth, TX |
| 415       | San Francisco, CA |
| 614       | Columbus, OH |
+ --------- + --------- +
4 rows

select 
    t1.item,
    t2.name
from
    table1 t1 join table2 t2 on (
        t1.code = t2.code
        or t1.code like concat(t2.code, ',%')
        or t1.code like concat('%,', t2.code, ',%')
        or t1.code like concat('%,', t2.code)
    )
order by t1.item

+ --------- + --------- +
| item      | name      |
+ --------- + --------- +
| 1         | Columbus, OH |
| 2         | Columbus, OH |
| 2         | New York, NY |
| 2         | San Francisco, CA |
| 3         | Ft. Worth, TX |
| 3         | New York, NY |
+ --------- + --------- +
6 rows

EDIT: or if you want to keep the data denormalized like this:

select 
    t1.item,
    group_concat(t2.name)
from
    table1 t1 join table2 t2 on (
        t1.code = t2.code
        or t1.code like concat(t2.code, ',%')
        or t1.code like concat('%,', t2.code, ',%')
        or t1.code like concat('%,', t2.code)
    )
group by t1.item
order by t1.item

+ --------- + -------------------------- +
| item      | group_concat(t2.name)      |
+ --------- + -------------------------- +
| 1         | Columbus, OH               |
| 2         | Columbus, OH,New York, NY,San Francisco, CA |
| 3         | Ft. Worth, TX,New York, NY |
+ --------- + -------------------------- +
3 rows
John
  • 3,458
  • 4
  • 33
  • 54
  • 1
    I tried your way and one of the other ways posted here. I got both to work but I can't express how much simpler this way was. Saved a ton of time. Thanks! – Justin Gurtz Jun 16 '15 at 20:51
  • 1
    Nice. I think concat('%,', t2.code, '%') needs to be concat('%,', t2.code, ',%'), rigth? That's adding a comma before the trailing %. – Karl Kieninger Jun 17 '15 at 00:10
  • @Karl Kieninger... you are correct sir. The way it was originally written you would get false hits (e.g. if you were looking for id=12 and had 100,122 you would get that record where you shouldn't). I'll edit the answer to reflect your corrections. – John Jun 17 '15 at 01:53
0

And here we see a perfect example of why using comma-separated lists in DB fields is a bad idea. They are tons harder to manipulate than a proper relational table.

With that in mind I would consider first splitting the code into multiple records, then doing the easy set based replace, and then putting them back together. Essentially:

  1. Using a split function to create a temp table tmp1 with 1 record for each item/code pair.

  2. Then do an UPDATE on the tmp1.code from tmp1 joined to table1.

  3. Finally use GROUP_CONCAT to put the Names back together.

Community
  • 1
  • 1
Karl Kieninger
  • 8,841
  • 2
  • 33
  • 49
  • It took me a LOT of additional research to figure out the split function/stored procedure/cursor part but this is effectively how I got the job done. Thanks! – Justin Gurtz Jun 16 '15 at 20:36