-1

I currently have a table called "RESOURCES" with a keywords field called "RES_Tags". The "RES_Tags" field contains a comma-delimited list of keywords for each record.

I need to normalize this table/field.

I have already set up the following tables: TAGS, TAGS_TO_RESOURCES.

Please see the schema here: http://sqlfiddle.com/#!9/edac4/1

What is a query that will allow me to parse the keywords in RES_Tags, write them into the TAGS table without creating duplicates and then write a listing in the TAGS_TO_RESOURCES table?

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
nbardach
  • 123
  • 1
  • 10
  • 1
    Honestly, I would write a little script in Python to do this. Fetch each row from `tags.res_tags`, split the string into an array, and run a loop to insert multiple rows into the new table, one row per tag. – Bill Karwin Nov 08 '17 at 23:05
  • I honestly thought of doing just that in VBS (my favorite flavor). I can use a FOR EACH x IN RES_Tags, NEXT, LOOP, but I thought maybe it would be better to go from the DB end. Still on the fence.... – nbardach Nov 09 '17 at 03:48
  • 1
    This is a case where it's so awkward to do the task in SQL that it's far better to do it in application code. SQL was meant from its earliest days to be used in the context of application code. – Bill Karwin Nov 09 '17 at 15:47
  • 1
    Also, the SQL version tends to be a lot harder to maintain. My example code would need to be modified (or more complex) to handle cases where was more than six items crammed into the column. – Brian Dewhirst Nov 10 '17 at 00:02
  • 2
    Google '(undo OR reverse OR inverse) group_concat'. This is a faq. – philipxy Nov 10 '17 at 11:08
  • Please [use text for text, don't use links/images](https://meta.stackoverflow.com/a/285557/3404097). Please read [ask] and [mcve] and show what you've researched & tried. – philipxy Nov 25 '17 at 09:27

2 Answers2

1

Please copy your code into the actual posting, and provide the code you've tried to use to solve the problem.

The substring_index function returns a portion of a string with some delimiter (here a comma), and when a negative index is passed it starts searching for matches from the opposite side, so -1 grabs one item from what would otherwise be multi-item lists (for index>=2).

Per our discussion, I've tweaked how I did this and showed an example of using auto-increment. (This is run in the 'build schema' part of fiddle.)

create table TAGS
(`T_ID` int auto_increment primary key, `T_Name` varchar(18))
;

insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(RES_Tags, ',', 1) as X
    FROM RESOURCES
;

insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(RES_Tags, ',', 2)
      ,',',-1)
  FROM RESOURCES
;

insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(RES_Tags, ',', 3)
      ,',',-1)  as X
  FROM RESOURCES
;
insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(RES_Tags, ',', 4)
      ,',',-1)  as X
  FROM RESOURCES
  ;

insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(RES_Tags, ',', 5)
      ,',',-1)  as X
  FROM RESOURCES
;

insert ignore into TAGS (T_Name)
  SELECT 
    SUBSTRING_INDEX(
      SUBSTRING_INDEX(RES_Tags, ',', 6)
      ,',',-1)  as X
  FROM RESOURCES
;

create table New_TAGS like TAGS;
insert into New_TAGS (T_Name)
  select distinct trim(T_Name)
  from TAGS;

drop table TAGS;
rename table NEW_TAGS to TAGS;

documentation of the substring function Possible duplication of this question

Brian Dewhirst
  • 307
  • 3
  • 7
  • Thx, Brian! How would your query lead to the solution I'm looking for (ie. writing the substring (if unique) to the TAGS table and writing a reference in the TAGS_TO_RESOURCES table)? – nbardach Nov 09 '17 at 03:52
  • 1
    My code returns the 22 unique tags in the resources table. If you expect tags to be re-used within a list (i.e., 'apple' to occur multiple times in fruit), you'd have to add `group by 1` at the bottom. What code have you tried to get your results into your tags table? I know generally what you're asking, but what is your intended logic for TAGS_TO_RESOURCES? – Brian Dewhirst Nov 09 '17 at 13:12
  • Thanks again, Brian! The 22 tags returned are unique b/c I forgot to include duplicates in the tags like. I've updated the SQL Fiddle to include a duplicate ("banana" in Fruits and in Breads). When I run your query, it does return "banana" twice. Do you have a fix for that? The reasoning for the many-to-many TAGS_TO_RESOURCES table is to bring the tags back out alongside the Resources they were formerly associated with. Hope that makes sense. – nbardach Nov 09 '17 at 21:56
  • 1
    I've tweaked my code above, to remove duplicates. You will probably want to tweak how your tables are designed, to use auto increment [link to documentation](https://dev.mysql.com/doc/refman/5.7/en/example-auto-increment.html). – Brian Dewhirst Nov 09 '17 at 23:31
1
  1. based on RESOURCES.RES_tags create set of INSERT ... INTO TAGS ... statements. Prevent duplicating either with UNIQUE constraint in TAGS and ON DUPLICATE KEY ... or using INSERT ... SELECT ... NOT EXISTS():

a) append on the fly some character to the start of RES_tags and different character to the end(say - to start, + to end) - but don't save it back into DB(a,b,c would transform into -a,b,c+)

b) replace on the fly each ',' into ending previous INSERT statement and starting next one; replace '-' with starting only, '+; with ending part only(e.g. - is replaced with insert into tags(tag) values(", + becomes '") and , would be "), (" - but for keeping them unique it will be required to add something mentioned in step #1)

  1. execute SQL generated by #1(e.g. insert into tags(tag) values("a"), ("b"), ("c"))

  2. link entity with tags using:

    INSERT INTO TAGS_TO_RESOURCES(resource_id, tag_id)
    SELECT RESOURCES.id, TAGS.id
    FROM RESOURCES
    INNER JOIN TAGS
    ON INSTR(CONCAT(',', RESOURCES.RES.tags, ','), CONCAT(',', TAGS.tag, ','))> 0   
    
skyboyer
  • 22,209
  • 7
  • 57
  • 64