On a digitized version of a 1950s automotive parts book, the model entries were designed to be read by human eyes so to save space they were abbreviated. Of course, this doesn't work with my search engine and I know very little about regular expressions and even less about REGEXP so I hope someone here can assist. I did look through the MySQL manual but I'm afraid there was nothing like this. There are other variations of the data too but I want to start with this bit. I am currently doing the parsing using a PHP script that takes hours to run and, even though it needs to be run only when there is updated data, it takes far too long.
Here is the data in a single row field, separated by semicolons (NOTE: these are on one line in the original but I added returns here to make it more easily readable):
2452-62-65-67-72-92-95-98;
2552-62-65-72-77-92-95;
2650-51-52-62-65-72-77-92-95-97;
5450-51-52-62-65-67-72-77-82-85-92-95-97
... and those that need parsing will always have the dashes but I need it to somehow parse them to look like this. It needs to ignore any text since some have things like ALL 24TH; 25TH; 26TH which post separately:
2452 2462 2465 2467 2472 2492 2495 2498;
2552 2562 2565 2572 2577 2592 2595;
2650 2651 2652 2662 2665 2672 2677 2692 2695 2697;
5450 5451 5452 5462 5465 5467 5472 5477 5482 5485 5492 5495 5497
This simple query will pull up a four digit number but I don't how to proceed.
SELECT
Models
FROM
parts_listing
WHERE
Models REGEXP '^[0-9]{4}$'
ORDER BY Models;
Here's a shorter example of a basic row, or grouping of models for a given part:
2206-13-26-33; 2302-06-13-32-33
... but the row may also contain other info (that I'll need to eventually parse too) as shown in these examples. For the first and second we need to parse out the ALL and the ordinal and for the third we need to parse out anything between parenthesis. The fourth is an example that begins and ends with ALL values but has some dash-separated values in between.
ALL 24TH; 25TH; 26TH
2262-65-70-71-72-75-76-77-79-80-82-86-92-93-95; ALL 23RD
5401-11-31 (BODIES 5467-77-97)
ALL 22ND; 2301-02-13-32; ALL 24TH; 25TH; 26TH; 54TH
There also some with LHD (or sometimes RHD) and either at the beginning or in some cases in other places in the row and always has a comma after it:
LHD, 2401 (BODIES 2462-65-92-95-98); 2501; 2601-33; 5400-33
It's possible there are other variations too but for now it's only the basic model information that I'm after.