2

I have an old database with a gazillion records (more or less) that have a single tags column (with tags being pipe-delimited) that looks like so:

    Breakfast
    Breakfast|Brunch|Buffet|Burger|Cakes|Crepes|Deli|Dessert|Dim Sum|Fast Food|Fine Wine|Spirits|Kebab|Noodles|Organic|Pizza|Salad|Seafood|Steakhouse|Sushi|Tapas|Vegetarian
    Breakfast|Brunch|Buffet|Burger|Deli|Dessert|Fast Food|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
    Breakfast|Brunch|Buffet|Cakes|Crepes|Dessert|Fine Wine|Spirits|Salad|Seafood|Steakhouse|Tapas|Teahouse
    Breakfast|Brunch|Burger|Crepes|Salad
    Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Steakhouse|Vegetarian
    Breakfast|Brunch|Cakes|Dessert|Dim Sum|Noodles|Pizza|Salad|Seafood|Vegetarian
    Breakfast|Brunch|Deli|Dessert|Organic|Salad
    Breakfast|Brunch|Dessert|Dim Sum|Hot Pot|Seafood
    Breakfast|Brunch|Dessert|Dim Sum|Seafood
    Breakfast|Brunch|Dessert|Fine Wine|Spirits|Noodles|Pizza|Salad|Seafood
    Breakfast|Brunch|Dessert|Fine Wine|Spirits|Salad|Vegetarian

Is there a way one could retrieve each tag and insert it into a new table tag_id | tag_nm using MySQL only?

Cœur
  • 37,241
  • 25
  • 195
  • 267
Moak
  • 12,596
  • 27
  • 111
  • 166
  • The values are pipe delimited? – OMG Ponies Nov 17 '10 at 04:45
  • Yes, this represents one out of 20 collumns – Moak Nov 17 '10 at 04:46
  • Can't you just fetch the data and, for example, use PHP's str_replace() and make dumpfile? – Wouter Dorgelo Nov 17 '10 at 04:50
  • does the solution need to be purely MySQL..? you would need to query that column, do a split on the strings by the delimiter, add everything to an array (or even just directly to the new table), and for each insert check whether that tag already exists. Might be jumping the gun, but you might want the ID of the records too while you do this if you want to do record <-> tag relationship. – DJ Bouche Nov 17 '10 at 04:50
  • 1
    +1 for trying for doing the right thing and leaving the insanity in the old db design... (plus this is a common problem with a lot of incrementally developed applications) – tobyodavies Nov 17 '10 at 04:55
  • 1
    @Moak `insert ignore` + define the table with unique key on tag name – ajreal Nov 17 '10 at 05:01

2 Answers2

2

Here is my attempt which uses PHP..., I imagine this could be more efficient with a clever MySQL query. I've placed the relationship part of it there too. There's no escaping and error checking.

$rs = mysql_query('SELECT `venue_id`, `tag` FROM `venue` AS a');
while ($row = mysql_fetch_array($rs)) {
    $tag_array = explode('|',$row['tag']);
    $venueid = $row['venue_id'];
    foreach ($tag_array as $tag) {
        $rs2 = mysql_query("SELECT `tag_id` FROM `tag` WHERE tag_nm = '$tag'");
        $tagid = 0;
        while ($row2 = mysql_fetch_array($rs2)) $tagid = $row2['tag_id'];
        if (!$tagid) {
            mysql_execute("INSERT INTO `tag` (`tag_nm`) VALUES ('$tag')");
            $tagid = mysql_insert_id;
        }
        mysql_execute("INSERT INTO `venue_tag_rel` (`venue_id`, `tag_id`) VALUES ($venueid, $tagid)");
    }
}
DJ Bouche
  • 1,154
  • 1
  • 9
  • 17
  • +1 Yep, something like this. I think developing a PHP script (or any other language really) is going to be easier to develop and debug than trying to do this all in SQL code. – Brandon Montgomery Nov 17 '10 at 06:10
  • But that only solves the issue for PHP programmers. I added a MYSQL only answer for future reference. +1 for answering, but havn't tested – Moak Nov 17 '10 at 06:17
1

After finding there is no official split function I've solved the issue using only MySQL like so:

1: I created the function strSplit

CREATE FUNCTION strSplit(x varchar(21845), delim varchar(255), pos int) returns varchar(255)
return replace(
replace(
substring_index(x, delim, pos),
substring_index(x, delim, pos - 1),
''
),
delim,
''
);

Second I inserted the new tags into my new table (real names and collumns changed, to keep it simple)

INSERT IGNORE INTO tag (SELECT null, strSplit(`Tag`,'|',1) AS T FROM `old_venue` GROUP BY T)

Rinse and repeat increasing the pos by one for each collumn (in this case I had a maximum of 8 seperators)

Third to get the relationship

INSERT INTO `venue_tag_rel` 
(Select a.`venue_id`, b.`tag_id` from `old_venue` a, `tag` b 
     WHERE 
     (         
     a.`Tag` LIKE CONCAT('%|',b.`tag_nm`) 
     OR a.`Tag` LIKE CONCAT(b.`tag_nm`,'|%') 
     OR a.`Tag` LIKE CONCAT(CONCAT('%|',b.`tag_nm`),'|%') 
     OR  a.`Tag` LIKE b.`tag_nm`
     ) 
)
Moak
  • 12,596
  • 27
  • 111
  • 166