0

I have a database with an "Instructions" field (mediumtext) where each entry contains a paragraph of instructions in the form of an ordered list. Currently, when viewed, each list item is displayed on a new line by calling it with PHP nl2br function.

Example entry:

  1. Place the flour, baking powder and a pinch of salt in a bowl and combine. Set aside. 2. Place the butter and sugar in a mixer bowl and cream at high speed until light and creamy, using the paddle attachment. 3. Reduce the mixer to a moderate speed and gradually add the egg until well emulsified. 4. Add the flour mixture and mix until it comes together to form a dough. Remove the dough from the mixing bowl and place between 2 sheets of baking parchment. 5. Roll the dough to a thickness of 5mm. 6. Place in the freezer while preheating the oven to 170°C/340°F. 7. Peel off the parchment and bake the dough until golden. 8. Allow to cool, then store in a sealed container until needed.

As you can see there are also numbers within the text.

I want to split this single field into a separate table, where each individual instruction list item would have its own row and id linking it to the current item.

Is there a way I can split my existing fields with MySQL? can a "Number. " be the delimiter.

itsdarrylnorris
  • 630
  • 5
  • 21
HowardHosk
  • 33
  • 4
  • This is a starting point: https://stackoverflow.com/questions/4021507/how-to-extract-the-nth-word-and-count-word-occurrences-in-a-mysql-string – Kamil Gosciminski May 21 '18 at 22:14
  • 1
    Think about normalizing you schema. – sticky bit May 21 '18 at 22:15
  • Splitting on numbers isn't going to be good enough - text like `between 2 sheets` and `170°C/340°F` will break that approach. You're going to need something fancier, like NLP / context awareness. (or employ a human to break out the steps manually) – StuartLC May 21 '18 at 22:17
  • You could try to establish a pattern like number (possibly multiple digit), dot and a space after that. Depending on your data it may require some manual fixing. – Kamil Gosciminski May 21 '18 at 22:21
  • 1
    This kind of parsing is often better done by selecting the data out into a procedural language/client, parsing it on the client side, and then re-inserting the processed result (and deleting the source). It _could_ be done in a stored procedure, but non-SQL languages usually have much better library support for this kind of thing. _(It might be worth nothing MySQL doesn't even have built in support for "split on delimiter character", let alone "some number".)_ – Uueerdo May 21 '18 at 23:28
  • @stickybit Pretty sure that is exactly what they are trying to do. – Uueerdo May 21 '18 at 23:33

1 Answers1

2

You can do this with a stored procedure. This one assumes that steps start at 1, are sequentially numbered and all look like the step number followed by a period, a space and then the step text (which is what your sample data looks like). It should be reasonably easy to modify to work with slightly different formats. I have made the procedure produce a result set of the steps, however you could also change the SELECT into an INSERT to copy the steps into a new table.

DELIMITER //
DROP PROCEDURE IF EXISTS split_recipe //
CREATE PROCEDURE split_recipe(IN recipe VARCHAR(2048))
BEGIN
  DECLARE step INT DEFAULT 1;
  DECLARE next_step INT DEFAULT step+1;
  DECLARE this_step VARCHAR(256);
  WHILE recipe RLIKE CONCAT('^[[:blank:]]*', step, '[[.period.]]') DO
    -- is there a next step?
    IF recipe RLIKE CONCAT('^[[:blank:]]*', step, '[[.period.]] .*', next_step, '[[.period.]]') THEN
      SET this_step = SUBSTRING_INDEX(SUBSTRING_INDEX(recipe, CONCAT(next_step, '. '), 1), CONCAT(step, '. '), -1);
    ELSE
      SET this_step = SUBSTRING_INDEX(recipe, CONCAT(step, '. '), -1);
    END IF;
    -- output this step
    SELECT step, this_step;
    -- remove this step from the recipe
    SET recipe = SUBSTRING_INDEX(recipe, CONCAT(step, '. ', this_step), -1);
    SET step = next_step;
    SET next_step = step + 1;
  END WHILE;
END //

With your sample data:

CALL split_recipe('1. Place the flour, baking powder and a pinch of salt in a bowl and combine. Set aside. 2. Place the butter and sugar in a mixer bowl and cream at high speed until light and creamy, using the paddle attachment. 3. Reduce the mixer to a moderate speed and gradually add the egg until well emulsified. 4. Add the flour mixture and mix until it comes together to form a dough. Remove the dough from the mixing bowl and place between 2 sheets of baking parchment. 5. Roll the dough to a thickness of 5mm. 6. Place in the freezer while preheating the oven to 170°C/340°F. 7. Peel off the parchment and bake the dough until golden. 8. Allow to cool, then store in a sealed container until needed.')

Output:

step    this_step   
1       Place the flour, baking powder and a pinch of salt in a bowl and combine. Set aside. 
2       Place the butter and sugar in a mixer bowl and cream at high speed until light and creamy, using the paddle attachment. 
3       Reduce the mixer to a moderate speed and gradually add the egg until well emulsified. 
4       Add the flour mixture and mix until it comes together to form a dough. Remove the dough from the mixing bowl and place between 2 sheets of baking parchment. 
5       Roll the dough to a thickness of 5mm. 
6       Place in the freezer while preheating the oven to 170°C/340°F. 
7       Peel off the parchment and bake the dough until golden. 
8       Allow to cool, then store in a sealed container until needed.

Note that this procedure produces multiple single row result sets (one for each step - I have combined these for ease of reading above). If only one result set is required, the procedure would need to be modified to store the steps into a temporary table and then fetch all the data from the temporary table at the end. Alternatively, code such as the following (for PHP/PDO/MySQL) could be used in the application:

$result = $link->query("call split_recipe('1. Place the flour...')");
do {
    if ($result->columnCount()) {
        $row = $result->fetch();
        print_r($row);
    }
} while ($result->nextRowset());

Here is a modified version of the procedure which will split recipes from a table recipes (RecipeID INT, Instructions VARCHAR(2048)) into a new table new_recipes (RecipeID INT, step_num INT, Instruction VARCHAR(256)).

DELIMITER //
DROP PROCEDURE IF EXISTS split_recipes //
CREATE PROCEDURE split_recipes()
BEGIN
  DECLARE rid INT;
  DECLARE recipe VARCHAR(2048);
  DECLARE step INT;
  DECLARE next_step INT;
  DECLARE this_step VARCHAR(256);
  DECLARE finished INT DEFAULT 0;
  DECLARE recipe_cursor CURSOR FOR SELECT RecipeID, Instructions FROM recipes;
  DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;
  DROP TABLE IF EXISTS new_recipes;
  CREATE TABLE new_recipes (RecipeID INT, step_num INT, Instruction VARCHAR(256));
  OPEN recipe_cursor;
  recipe_loop: LOOP
    FETCH recipe_cursor INTO rid, recipe;
    IF finished = 1 THEN
      LEAVE recipe_loop;
    END IF;
    SET step = 1;
    SET next_step = 2;
    WHILE recipe RLIKE CONCAT('^[[:blank:]]*', step, '[[.period.]]') DO
      -- is there a next step?
      IF recipe RLIKE CONCAT('^[[:blank:]]*', step, '[[.period.]] .*', next_step, '[[.period.]]') THEN
        SET this_step = SUBSTRING_INDEX(SUBSTRING_INDEX(recipe, CONCAT(next_step, '. '), 1), CONCAT(step, '. '), -1);
      ELSE
        SET this_step = SUBSTRING_INDEX(recipe, CONCAT(step, '. '), -1);
      END IF;
      -- insert this step into the new table
      INSERT INTO new_recipes VALUES (rid, step, this_step);
      -- remove this step from the recipe
      SET recipe = SUBSTRING_INDEX(recipe, CONCAT(step, '. ', this_step), -1);
      SET step = next_step;
      SET next_step = step + 1;
    END WHILE;
  END LOOP;
END //
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Does this produce one result set with 8 rows, or 8 single row results? – Uueerdo May 21 '18 at 23:36
  • It produces 8 single row results. If a single result set is needed the data would need to go into a temporary table to be selected at the end. – Nick May 21 '18 at 23:41
  • Nick that looks like the perfect solution. I thought I would have to manually edit so many records (i understand I will still have to moderate/check them whatever happens). I am struggling to implement it though. I figured out how to call a stored procedure, but applying it to my database is giving me trouble. Given I have the table `recipes` with the fields `RecipeID` and `Instructions`, how do I modify what you have to split the instruction field to a new table, keeping attached the relevant RecipeID to the new individual instruction rows. – HowardHosk May 22 '18 at 21:00
  • 1
    Hi @HowardHosk I've added a second procedure to the answer which should do what you want. – Nick May 22 '18 at 22:18