1

I hope I can ask this question in a way that makes some sense as the data itself is not good. However, as it was written in the 1950s, I have no option but to try to make the best of it in trying to bring it into the current century for those of us who need it. Also, Benchmark speed is not an issue as this is run manually and only rarely, and always on my local PC rather than on the live server. The resulting table is then uploaded.

To outline, it is looking in the Models field in parts_listing to see any with the ordinal numbers 22ND, 23RD etc. If there are any, then it looks in parts_modelno to find all that begin with the cardinal number 22, 23 etc. It takes all that match and populates parts_parsed with the rows in the select statement. It is doing that but it is getting all from parts_modelno, not just those that match.

For the first query below, it works well but needs multiple copies, one for each of the ordinal and cardinal values of 22ND, 23RD, 24TH, 25TH, 26TH, 54TH and 22, 23, 24, 25, 26, 54.

The Models field being queried in parts_listing contains data that varies considerably and another query has already extracted the bulk of it. For example, 2250-51-52-55-70-71 is a shortcut for 2250, 2251, 2252, 2255, 2270, 2271 and there are others such as these below with each example being in its own row. Those have already been parsed into parts_parsed. Those with BODY (and similar ones with MODEL) are not being parsed properly but that's another issue and not important here.

2662-92; 5462-92
ALL 22ND; 2301-02-13-32
LHD, 2401-02-13; 2501-02-13-31; 2601-02-11; 5400-01-02-11
2201 (BODY 2293)

and quite a few other variations. These codes have been parsed out in an earlier operation and are already in the parsed format other than Models in parts_listing which is the original and from which this is trying to find the ordinals so that the data can be completed with those values. Without this, if there is an ALL 22ND or one or more of the others in the data, none of those entries currently appear and that's what I am trying to fix by running an additional query to do so.

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup, pl.ID AS partID, PartNo AS partNo, chassisNo AS modelNo 
FROM parts_listing pl, parts_modelno pm 
WHERE Models LIKE '%22ND%' 
AND chassisNo LIKE '22%' 
AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1);

Following is what I tried in order to simplify it but it gives an entry for every instance of 22ND, 23RD, 24TH, 25TH, 26TH, 54TH and 22, 23, 24, 25, 26, 54 rather than just those matching 22ND and 22 and I understand why but I am unsure what to do about it.

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup, pl.ID AS partID, PartNo AS partNo, chassisNo AS modelNo 
FROM parts_listing pl, parts_modelno pm
WHERE Models REGEXP '22ND|23RD|24TH|25TH|26TH|54TH'
AND chassisNo REGEXP '22|23|24|25|26|54'
AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1);

The structure for parts_parsed is below, followed by a small sample of the data (there are over 450k rows).

CREATE TABLE IF NOT EXISTS `parts_parsed` (
`ID` int unsigned NOT NULL AUTO_INCREMENT,
`pageNo` int unsigned NOT NULL,
`baseGroup` int unsigned NOT NULL,
`partID` int unsigned NOT NULL,
`partNo` varchar(20) NOT NULL,
`modelNo` smallint unsigned DEFAULT NULL,
`bodyNo` smallint unsigned DEFAULT NULL,
`isRHD` tinyint unsigned NOT NULL DEFAULT '0',
PRIMARY KEY (`ID`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4 COLLATE=UTF8MB4_GENERAL_CI;

ID  pageNo  baseGroup   partID  partNo  modelNo bodyNo  isRHD
1   1   0   1   391906  2201        0
2   1   0   1   391906  2202        0
3   1   0   1   391906  2211        0
4   1   0   1   391906  2220        0
5   1   0   1   391906  2222        0
6   1   0   1   391906  2232        0
7   1   0   1   391906  2240        0
8   1   0   1   391906  2301        0
9   1   0   2   391907  2306        0
10  1   0   2   391907  2326        0

The structure of parts_modelno is below, followed by a small sample of data.

DROP TABLE IF EXISTS `parts_modelno`;
CREATE TABLE IF NOT EXISTS `parts_modelno` (
  `ID` smallint unsigned NOT NULL AUTO_INCREMENT,
  `seriesYear` varchar(8) DEFAULT NULL,
  `bodyNo` varchar(8) DEFAULT NULL,
  `chassisNo` varchar(8) DEFAULT NULL,
  `engineNo` varchar(8) DEFAULT NULL,
  `modelDesc` varchar(255) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4_GENERAL_CI;;

ID  seriesYear  bodyNo  chassisNo
1   1948-49 2213    2213
2   1948-49 2250    2226
3   1948-49 2251    2226
4   1948-49 2252    2206
5   1948-49 2255    2206
26  1949-50 2365    2301
27  1949-50 2372    2302
28  1949-50 2375    2302
29  1949-50 2379    2332
30  1949-50 2382    2302

And finally as requested, here is the structure of parts_listing followed by a very short sample selected at random.

DROP TABLE IF EXISTS `parts_listing`;
CREATE TABLE IF NOT EXISTS `parts_listing` (
  `ID` int unsigned NOT NULL AUTO_INCREMENT,
  `BaseGroup` int unsigned DEFAULT NULL,
  `GroupNumber` varchar(20) DEFAULT NULL,
  `BaseName` varchar(50) DEFAULT NULL,
  `GroupName` varchar(50) DEFAULT NULL,
  `Name` varchar(100) DEFAULT NULL,
  `PartNo` varchar(30) DEFAULT NULL,
  `Models` varchar(255) DEFAULT NULL,
  `Description` varchar(255) DEFAULT NULL,
  `Quantity` smallint unsigned DEFAULT NULL,
  `PageNo` int DEFAULT NULL,
  `SubPage` varchar(5) DEFAULT NULL,
  `RevDate` int DEFAULT NULL,
  `Edition` varchar(50) DEFAULT NULL,
  PRIMARY KEY (`ID`)
) ENGINE=INNODB AUTO_INCREMENT=1 DEFAULT CHARSET=UTF8MB4 COLLATE=UTF8MB4_GENERAL_CI;

ID  BaseGroup   GroupNumber BaseName    GroupName   Name    PartNo  Models  Description Quantity    PageNo  SubPage RevDate Edition
13570   30  30.46796    BODY    DOORS-DOOR REAR TRIM PANEL ASSY "SPECIFY TRIM SET NO"   453674  5450-51 RIGHT, SET 50-52    1   402     -490665600  48-54
1850    3   3.236   CLUTCH AND TRANSMISSION TRANSMISSION    FLANGE-DRIVING SHAFT UNIVERSAL JOINT    302868  2213-20-22-26; 2313     1   58      -490665600  48-54
16314   30  30.874934   BODY    RADIATOR GRILLE AND SPLASHER    BRACKET-RAD SIDE SPLASHER BRACE FRAME   G120370 2469; 2579; ALL 26TH; 54TH  NUT 7/16-20 2   469     -490665600  48-54
14633   30  30.666983   BODY    ELECTRICAL-ROOF LIGHT   GROMMET-ROOF LIGHT CABLE    403834  2280-86     1   423     -504921600  48-54
12273   30  30.34395    BODY    DOORS-DOOR FRONT    MOULDING-TRIM PANEL 442793  2677-79-97; 5467-97 INTERMEDIATE, LOWER, RIGHT  1   369     -490665600  48-54\

Additional examples of Models can be found here.

DonP
  • 725
  • 1
  • 8
  • 27
  • even with reg ex you must combie evers combination of model and chasis and join them by or – nbk Sep 06 '21 at 00:38
  • The data sample for table `parts_parsed` only filled into 7 out of 8 columns, which column is empty? – FanoFN Sep 06 '21 at 00:39
  • I'm assuming that you're trying to populate `parts_parsed` tables with data from other table but it's incorrect or something? Or are you trying to get an output out of the `parts_parsed` table? – FanoFN Sep 06 '21 at 00:48
  • I'll try to clarify in the original post but to answer here, it is looking in the Models field in parts_listing to see any with the ordinal numbers 22ND, 23RD etc. If there are any, then it looks in parts_modelno to find all that begin with the cardinal number 22, 23 etc. It takes all that match and populates parts_parsed with the rows in the select statement. It is doing that but it is getting all from parts_modelno, not just those that match. – DonP Sep 06 '21 at 01:01
  • The sample data was just data from the table, not data from any query. I posted it just to show what it looks like. – DonP Sep 06 '21 at 01:04
  • Not a problem. This can be done by using a list of pairs of ('22ND', 22) and these pairs can even be generated dynamically. Those who are already commenting are capable of providing this. If they're busy, I'll toss something together, to handle them all with one statement. It won't be long. – Jon Armstrong Sep 06 '21 at 01:14
  • Is this the complete list? `'22ND|23RD|24TH|25TH|26TH|54TH'` or can these values be found dynamically in the database? – Jon Armstrong Sep 06 '21 at 01:22
  • Yes, that is the complete list for this particular parts book but at some point others will be added to the same tables. There is no table containing them but it should be possible to glean them from a DISTINCT query of the first two digits in parts_modelno using the bodyNo or ChassisNo columns. – DonP Sep 06 '21 at 01:31
  • Could you provide (in the question) the `CREATE TABLE` statements for the rest of the tables in the problem? – Jon Armstrong Sep 06 '21 at 01:33
  • @Jon Armstrong Unless I missed one, they should already be there except for parts_listing as it has only a single field being used here. I'll add it though. Jon, I would love to send you the query that does the bulk of the parsing as maybe you'll see a way to incorporate all of this, including the mentioned BODY or BODIES issue into it! Too much and too complicated to explain and post here, though. This was all being done originally in PHP and it worked but it took hours to run! – DonP Sep 06 '21 at 01:38
  • I don't see the following tables in the question: `parts_listing` and `parts_reftype`. These are important tables in the logic. I couldn't tell which columns were from which tables, as not all the column references have a table qualifier. In any case, they should resolve properly. I just couldn't test properly without them. – Jon Armstrong Sep 06 '21 at 01:45
  • Oh, O forgot about parts_reftype as that was a last minute addition. It is simply pulling up a value that is currently 30 so it can be replaced with that value. The reason is, that group and later another, get their values from parts_modelno chassisNo column when the rest get it from bodyNo column. There is a later query that moves it where it needs to be but great if it could be incorporated into the original parsing query which is not part of this question. It is here but modified considerably since then: https://stackoverflow.com/questions/57844393/mysql-field-data-parsing#57849299 – DonP Sep 06 '21 at 01:52
  • Which version of MySQL are you using? `SELECT version();` – Jon Armstrong Sep 06 '21 at 01:55
  • The data isn't unambiguous without proper separators and literal quotes were needed. Best to provide `INSERT` statements. – Jon Armstrong Sep 06 '21 at 02:01
  • With your example in mind, the first 2 digits that exists in `parts_modelno` table are `22,23`. What you're aiming for is to scan `Models` column from `parts_listing` table and see if there's any value matching the first two digits of `22,23`. If in `parts_listing` that have `Models` value of `2469; 2579;`, this won't be returned since none of them matches the two digits from `parts_modelno` and if for this value `ALL 22; 2399; 2469; 2579;`, it will return all of `22,23` matches regardless of the two trailing digits of `99` from `2399`, but won't return the ones not match, is that correct? – FanoFN Sep 06 '21 at 03:04
  • DonP does need to be careful with these patterns. They aren't very specific. A regular expression of `'24'` will match lots of things, like `245`, `124`, `A24B`, since there's no restriction on where that pattern is found in the source string. But once those patterns (for `p1` and `p2`, in the proposed solutions) are nailed down appropriately, he should be in good shape. – Jon Armstrong Sep 06 '21 at 03:09
  • That's what I'm concerned with. Especially with this data structure, if the data looks like `5522; 5523;`, it will also be recognize as valid by regexp isn't it? What I had in mind is trying to separate all those numbers to each of it's own row first. Basically all the work is done only on `parts_listing` table and the other tables are just for additional data purposes, well maybe except for `parts_reftype`. – FanoFN Sep 06 '21 at 03:19
  • 1
    Agreed. The test data wasn't in a usable form, so I mainly had to assume @DonP's assumptions were valid. It's a risk. But without explicit `INSERT`s (unambiguous data) to cover the cases of interest, we're left to guess. – Jon Armstrong Sep 06 '21 at 03:27
  • Don't use SQL. Pull the data out, cleanse it, stick it back into _different_ tables that are better organized, and with no abbreviated numbers. _Then_ SQL will be a lot easier. – Rick James Sep 07 '21 at 16:44
  • That is exactly what the SQL is doing and it is being run in an SQL editor rather than on a webpage so little choice for that. It is all working not as it is. – DonP Sep 08 '21 at 17:11

2 Answers2

1

There are 3 slightly different versions of the solution here, for various versions of MySQL and MariaDB.

MariaDB 10.5 is compatible with all 3 solutions. MySQL 8.0 is compatible with 2, and MySQL 5.7 is compatible with just one of the following.

The pairs derived table or WITH clause term provides the pairs of patterns ('25TH', '25'), etc.

Once we have that, it's just a matter of joining with that list in your original SELECT query expression, used to generate the rows to be inserted.

-- MariaDB 10.5, MySQL 8.0, and MySQL 5.7

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , (
         SELECT '22ND' AS p1, '^22' AS p2   UNION
         SELECT '23RD', '^23'               UNION
         SELECT '24TH', '^24'               UNION
         SELECT '25TH', '^25'               UNION
         SELECT '26TH', '^26'               UNION
         SELECT '54TH', '^54'               -- etc
       ) AS pairs
 WHERE Models REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MySQL 5.7, 8.0 and MariaDB 10.5

-- MariaDB 10.5, and MySQL 8.0 

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
WITH pairs (p1, p2) AS (
         SELECT '22ND' AS p1, '^22' AS p2   UNION
         SELECT '23RD', '^23'               UNION
         SELECT '24TH', '^24'               UNION
         SELECT '25TH', '^25'               UNION
         SELECT '26TH', '^26'               UNION
         SELECT '54TH', '^54'               -- etc
     )
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , pairs
 WHERE Models REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MySQL 8.0 and MariaDB 10.5 (updated)

-- For MariaDB 10.5:

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
WITH pairs (p1, p2) AS (
         SELECT * FROM (
                         VALUES ('22ND', '^22')
                              , ('23RD', '^23')
                              , ('24TH', '^24')
                              , ('25TH', '^25')
                              , ('26TH', '^26')
                              , ('54TH', '^54')
                       ) AS x
     )
SELECT PageNo AS pageNo, BaseGroup AS baseGroup
     , pl.ID AS partID, PartNo AS partNo
     , chassisNo AS modelNo 
  FROM parts_listing pl
     , parts_modelno pm
     , pairs
 WHERE Models    REGEXP p1
   AND chassisNo REGEXP p2
   AND BaseGroup NOT IN (SELECT GroupNo FROM parts_reftype WHERE BodyChassis = 1)
;

Test case for MariaDB 10.5

MySQL 8.0 (and 5.7) had problems with the table value constructor FROM (VALUES (), (), ()) AS x. We replaced the table value constructor with a simple UNION list for MySQL 8.0 and 5.7. MySQL 5.7 does not support the WITH clause, so we replaced that with a derived table. The 5.7 version works for all 3 versions (MariaDB 10.5, MySQL 5.7 and 8.0).

Hopefully, I didn't insert any typos while trying to provide the above detail.

I noticed continuing conversation about generating patterns, which wasn't the focus on this question. If you have a question covering this, feel free to mention it. Here's something I worked up a few days ago, while thinking about your general problem. This just focuses on a few kinds of expressions you used that are intended to generate patterns. This shows how some of them might be handled in one expression. The data is self contained in the following query:

WITH RECURSIVE seq (n) AS (
            SELECT 1
             UNION ALL
            SELECT n + 1 FROM seq WHERE n <= 9
     )
   , args (arg) AS (
         SELECT '2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH' AS arg UNION
         SELECT '2210-21-23; 2311; 22ND; 29TH; 51ST'
     )
   , norm (term, arg, n) AS (
            SELECT TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.arg,';',seq.n-1) FROM SUBSTRING_INDEX(t1.arg,';',seq.n)), ';','')) AS term
                 , t1.arg
                 , n
              FROM args AS t1
              JOIN seq
                ON seq.n > 0 AND SUBSTRING_INDEX(t1.arg,';',seq.n-1) <> SUBSTRING_INDEX(t1.arg,';',seq.n)
             ORDER BY seq.n
     )
   , pattern1 (term, arg, n, pat) AS (
            SELECT t.term, t.arg, t.n
                 , CASE WHEN LENGTH(term) = 4
                        THEN
                             CASE WHEN SUBSTRING(term, -1) BETWEEN '0' AND '9'
                                  THEN CONCAT('^', term, '$')
                                  ELSE CONCAT('^', SUBSTRING(term, 1, 2))
                              END
                    END AS pat
             FROM norm AS t
     )
   , norm2 (term, arg, n, pat) AS (
            SELECT t1.term
                 , t1.arg
                 , seq.n
                 , CONCAT('^', TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.term,'-',seq.n-1) FROM SUBSTRING_INDEX(t1.term,'-',seq.n)), '-','')), '$') AS tag
              FROM pattern1 AS t1
              JOIN seq
                ON seq.n = 1
               AND t1.pat IS NULL
             UNION ALL
            SELECT t1.term
                 , t1.arg
                 , seq.n
                 , CONCAT('^', SUBSTRING(t1.term, 1, 2), TRIM(REPLACE(TRIM(LEADING SUBSTRING_INDEX(t1.term,'-',seq.n-1) FROM SUBSTRING_INDEX(t1.term,'-',seq.n)), '-','')), '$') AS tag
              FROM pattern1 AS t1
              JOIN seq
                ON seq.n > 1 AND SUBSTRING_INDEX(t1.term,'-',seq.n-1) <> SUBSTRING_INDEX(t1.term,'-',seq.n)
               AND t1.pat IS NULL
             UNION ALL
            SELECT t1.*
              FROM pattern1 AS t1
             WHERE t1.pat IS NOT NULL
     )
SELECT *
  FROM norm2
 ORDER BY arg, term, n
;

Result containing the generated patterns:

+---------------+---------------------------------------------------------+------+--------+
| term          | arg                                                     | n    | pat    |
+---------------+---------------------------------------------------------+------+--------+
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    1 | ^2210$ |
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    2 | ^2221$ |
| 2210-21-23    | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    3 | ^2223$ |
| 22ND          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    3 | ^22    |
| 2311          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    2 | ^2311$ |
| 29TH          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    4 | ^29    |
| 51ST          | 2210-21-23; 2311; 22ND; 29TH; 51ST                      |    5 | ^51    |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    1 | ^2213$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    2 | ^2220$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    3 | ^2222$ |
| 2213-20-22-26 | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    4 | ^2226$ |
| 22ND          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    3 | ^22    |
| 2313          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    2 | ^2313$ |
| 23RD          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    4 | ^23    |
| 24TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    5 | ^24    |
| 25TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    6 | ^25    |
| 26TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    7 | ^26    |
| 54TH          | 2213-20-22-26; 2313; 22ND; 23RD; 24TH; 25TH; 26TH; 54TH |    8 | ^54    |
+---------------+---------------------------------------------------------+------+--------+
Jon Armstrong
  • 4,654
  • 2
  • 12
  • 14
  • That looks very promising but how does it fetch the 23RD, 24TH etc? Also, it gives a syntax error on line 3. – DonP Sep 06 '21 at 01:56
  • MySQL apparently updated itself as I see it's running version 8 but the live server is 7. – DonP Sep 06 '21 at 02:01
  • You'll need to use a derived table for MySQL 5.7. This was just the first two pairs in the list. You can add the rest similarly. I can show you that too. I'll first provide a MySQL 5.7 version, so that you don't have the syntax issues of the unsupported `WITH` clause. – Jon Armstrong Sep 06 '21 at 02:03
  • @DonP I added a version for MySQL 5.7, which should be fine with later versions and MariaDB. I added another `UNION` term to provide one more pair in your list, as another example, with a link to the updated fiddle. – Jon Armstrong Sep 06 '21 at 02:08
  • No need for MySQL 5.7 as I'm not running that. I'll see if I can figure out how to add pairs but running it as it is it very quickly generated 48,810 rows of data which seems like a lot for only 22ND! – DonP Sep 06 '21 at 02:09
  • @DonP You said the live server is version 7. MySQL has no version 7. That's why I assumed you meant 5.7. ALSO: Please run the `SELECT` **without the `INSERT`** part of the statement, to see the generated rows that would be inserted. – Jon Armstrong Sep 06 '21 at 02:10
  • When I ran mysql -V it showed an 8 version and I know the hosting company is a version behind which I thought must be 7. Apparently 8.0.26 is the latest and my system here is up to date. Anyway, sorry for the confusion. In any event, I ran the latest query after adding in the other pairs and it gave me 152,199 rows but it will take a little while to actually test it with all the other parsed data in place. I'll post back once that's been done and thank you for the help! – DonP Sep 06 '21 at 02:21
  • @DonP No problem at all. I've updated the `pairs` logic to include all the patterns you provided in the problem description. Happy testing. – Jon Armstrong Sep 06 '21 at 02:58
  • Thank you! I've been testing it on the actual local copy of the site where it's easier to see the validity of the results and it's almost there. Overall, there are 9 queries running to get the parsed data to be reasonably good valid (other than the BODY/BODIES issue) and, after copying this query and modifying to fetch the bodyNo rather than the chassisNo for one section, it all runs reasonably quickly. Both versions along with the other 7 run in succession. However, entries like ALL 24TH; 25TH; 26TH; 54TH are somehow pulling up a few but not all with 22ND and I'm not sure why. – DonP Sep 06 '21 at 03:15
  • @DonP If you can provide any explicit test data (`INSERT` statements) that produce strange results, that would help diagnose the issues. – Jon Armstrong Sep 06 '21 at 03:24
  • There are only three inserts: the initial one that parses all the dashed numbers into something usable and inserts it into parts_parsed, then the one you wrote and the slightly modified copy I tweaked. Is that what you meant? The rest are update queries. Otherwise I'm not sure how to get such output from HeidiSQL which is what I am running. Its log shows only the query, not the data being produced. – DonP Sep 06 '21 at 03:42
  • I see the problem. The query isn't looking at only the first two digits as it must and the four 22ND series entries that it is pulling up have the others in them. It is getting 2226, 2226, 2240 and 2240 (twice each because of two different bodies). It is seeing the 26 in 2226 and the 24 in 2440 and thinking it's a match. – DonP Sep 06 '21 at 03:58
  • @DonP I mentioned that in a prior comment. '24' is a pattern which finds those characters at any position in the source string. If you want them to be at the start of the string, use '^24'. See the last panel in the following fiddle: https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=fc10c91a9acb8e4b298b3d891bd1b681 – Jon Armstrong Sep 06 '21 at 04:02
  • @DonP What about the '24TH' pattern? Does that also need to be more specific? Here's the updated fiddle: https://dbfiddle.uk/?rdbms=mariadb_10.5&fiddle=b64655fde5d3c9ab04b9b3e13a4697c2 – Jon Armstrong Sep 06 '21 at 04:04
  • 24TH is okay as it will always be 24TH but within the four-digit numbers there might be other matches as we've found. Thank you for sorting it out for me! – DonP Sep 06 '21 at 04:06
  • @DonP If the answer is acceptable, you can `Accept` it by using the checkmark near the up/down vote controls. – Jon Armstrong Sep 06 '21 at 04:10
  • It is and I will shortly although I can't quite figure out where to place the ^22 etc as the diddle shows SELECT '1234' REGEXP ^22. I'm not sure what that means. There is no comma as there was but trying it either way gives an error. – DonP Sep 06 '21 at 04:19
  • @DonP `SELECT '1234' REGEXP '^22'` ... But note that '1234' does not match because it does not begin with '22'. I updated the answer/fiddle to show the exact syntax. – Jon Armstrong Sep 06 '21 at 04:27
  • Thank you and it appears to work perfectly! – DonP Sep 06 '21 at 04:36
  • Have you checked Jon's updated answer @DonP? – FanoFN Sep 15 '21 at 05:08
  • No, I don't get notices about new posts for days after they are made and none about updates. Thank you for pointing it out although I have no idea how to apply it to my varying data as it's hard coded. It even has non-existing entries such as 29TH; 51ST but even on those that are valid, running it gives no results whatsoever so I'n not sure how to proceed. The only way to get values for 22ST, 22ND etc. is to look them up in the parts_modelno table. – DonP Sep 15 '21 at 06:24
  • @DonP I show test data in these examples that generate what I believe is expected results. If you can provide similar tests that produce problem results, that might help diagnose any issues. – Jon Armstrong Sep 15 '21 at 06:27
  • @DonP Just to be clear, the additional SQL I showed is just about generating various patterns from different kinds of expressions in your data. To apply these patterns to find related `model` data, for instance, is just a simple `JOIN`, much like the SQL in the first part of the answer. I didn't want to confuse the goal of this new SQL. It was just an example of how to build the search patterns from your initial strings containing many pattern terms. – Jon Armstrong Sep 15 '21 at 06:44
0

I go with a different approach, I believe it's achievable with a short query but with layers of functions. I'll try to explain it one by one and hopefully I don't miss anything.

The idea I had was very simple, to separate the values in Models column from the table parts_listing first then present it as each rows and lastly match the first two digits of the extracted result with table parts_modelno. So, here is the query:

INSERT INTO parts_parsed (pageNo, baseGroup, partID, partNo, modelNo) 
/*Part 1*/
WITH RECURSIVE seq AS(
    SELECT 1 sn UNION ALL
    SELECT sn+1 FROM seq WHERE sn+1 <= 100)
/*Part 3*/
SELECT PageNo AS pageNo, BaseGroup , pl.ID AS partID, PartNo, chassisNo AS modelNo 
   FROM
  (/*Part 2*/ 
     SELECT *,
        REGEXP_REPLACE(
            SUBSTRING_INDEX(
                 SUBSTRING_INDEX(Models,';',sn),';',-1),'[a-zA-Z]','') sval
       FROM seq JOIN parts_listing pl
         ON sn <= (LENGTH(Models)-LENGTH(REPLACE(Models,';','')))+1) pl
       JOIN parts_modelno pm 
         ON (LEFT(pl.sval,2)=LEFT(pm.bodyNo,2) OR LEFT(pl.sval,2)=LEFT(pm.chassisNo,2))
GROUP BY PageNo, BaseGroup, partID, PartNo, modelNo
ORDER BY PageNo, BaseGroup, partID, PartNo, modelNo ;

As you can see, there's a lot going on there. I have remarked them with 3 parts and that's how I'll attempt to explain them.

Part 1:
Create a sequence number using WITH RECURSIVE .. function; In MariaDB there's a very convenient way to create a sequence number by using their sequence engine function in which you can just write SELECT seq FROM seq_1_to_100 and you'll get rows of numbering sequence from 1 to 100. Anyway, there are two usage of the generated numbering sequence here, one is to be used in the count part of SUBSTRING_INDEX(str,delim,count) and the other is to be used as the ON filter in JOIN.

Part 2:

REGEXP_REPLACE(
            SUBSTRING_INDEX(
                 SUBSTRING_INDEX(Models,';',sn),';',-1),'[^0-9]','') sval
  1. Assuming that the data in models are all separated with ;, we're going to use it as the delimiter in the SUBSTRING_INDEX(str,delim,count) function.
  2. Using REGEXP_REPLACE to replace all non-numerical characters and return only numbers.

The ON part was also a bit of work as I attempt to get how many values nested in the Models that needed to be separated and return as a row of it's own. Therefore the basic idea is to use LENGTH() of characters on the original Models value then subtract it with the LENGTH() of characters on Models whereby the delimiter (;) is being removed. Now, (;) is just a single character so when we remove that and subtract it's length against the original value, we'll get 1; which actually represents that there are two values separated by it. That is why there's a +1 at the end of the character length subtraction.

ON sn <= (LENGTH(Models)-LENGTH(REPLACE(Models,';','')))+1) pl

Part 3:
Making the query above as sub-query then do a JOIN with parts_modelno table. The ON condition will match between the first two digits of value that was extracted from Models column earlier. This I believe is a straight forward operation.

Demo fiddle

FanoFN
  • 6,815
  • 2
  • 13
  • 33
  • 1
    Thank you! I’m traveling today back home to California from my Oregon shop but once I get back and get my PC there synced with the work done here, I’ll give it a try. Yes, the data chunks are separated by ; but there are also items such as RHD, (with the comma) or (EXCEPT BODY 2313) OR (BODIES 2313-33) which, for this work, are being ignored. These use the same structure as the rest and I wanted to mention them only in case the other characters might be an issue. It would be great to be able to filter for the bodies or chassis being included or excluded but for now, that is not too important. – DonP Sep 08 '21 at 17:22
  • Yes, I'm very curious to know if the query will work for you. I'm also interested in the other type of data you mentioned but it's probably easy for you to filter if you know what you're filtering beforehand and the values are consistent. – FanoFN Sep 09 '21 at 01:09
  • I am too but couldn’t copy and paste on an iPad from the fiddle to try it on my own database against real data so will have to wait until I have a PC available. – DonP Sep 10 '21 at 06:43
  • Yeah it's ok. It's best when you can run on proper machine for optimal tests. Besides the post will still be here unless it's being deleted, right. We can always comeback to comment or do editing in the future. – FanoFN Sep 10 '21 at 08:51
  • On a PC finally and ran the query which seems to run reasonable quickly. I have not yet tested for accuracy other than visually but it appears it is to replace the query answered here: https://stackoverflow.com/questions/57844393/mysql-field-data-parsing#57849299 rather than the one in this question. If that's the case, it's missing a few things. One is the RHD vs LHD. Some have `RHD,` meaning that the part works only on Left Hand or Right Hand drive vehicles. Also, not in either question, two or three BaseGroups need the bodyNo populated rather than the modelNo. BaseGroup 29, 30 and 31. – DonP Sep 11 '21 at 00:45
  • Can I assume that the data in the other question cover the most variation of the data style in your table? Also, what is the condition for the BaseGroup to populate bodyNo instead of modelNo? – FanoFN Sep 11 '21 at 01:07
  • If we're using the sample data from your previous question, the core of the query would probably return something like the `sval` column in [this fiddle](https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=ab69bbdb58333f04eeafd9d93fe48c22). Assuming that this is the first step, what would be the next step/ next condition? – FanoFN Sep 11 '21 at 02:52
  • BaseGroup 29, 30 and 31 (possibly others), the parsed data needs to go into bodyNo rather than modelNo which I have been doing to date with a secondary query. Also, on parts_listing.Models containing things like `2401 (BODIES 2462-65-92-95-98); 2501; 2601-11; 5400-01-11`, 2401 goes into modelNo and with a multiple rows for that with 2462, 2465, 2492, 2595 and 2408 in the bodyNo column. Other examples are `2401 (BODIES 2462-65-92-95-98); 2501; 2601-11; 5400-01-11`, `2240 (BODY 2286)` and `2601 (BODY 2697); 5401 (BODY 5497)`. The isRHD default is 0 but if `LHD` then 1 and `RHD` it is 2. – DonP Sep 11 '21 at 07:21
  • BaseGroup 29, 30 and 31 (possibly others), the parsed data needs to go into bodyNo rather than modelNo which I have been doing to date with a secondary query which has been done using a lookup table parts_reftype to look up the BaseGroups that require it. – DonP Sep 11 '21 at 07:24
  • One other thing missing are the ones that say ALL 22ND, ALL 23RD etc which would have to pull out the matches from the parts_modelno table to get them all. Those too are currently being done with a separate query and are, in fact, what this thread was originally all about. – DonP Sep 11 '21 at 22:48
  • I see, from what I gathered there are at least 9 types of data that we're looking at. I want to try something to confirm my understanding and post it here later. Hopefully I didn't misunderstand anything. – FanoFN Sep 13 '21 at 01:02
  • Could be that many and possibly more! I’m looking forward to seeing what you come up with. – DonP Sep 14 '21 at 17:48
  • Here's a fiddle I made based on what I understand from your requirement @DonP . But it's only covered the data variation partially and I'm hesitant whether at this point am I understanding it correctly or not? then I figured I let you look at it first before I go any further. https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=1be28da1970d0fe36b87af61738a98ad – FanoFN Sep 15 '21 at 06:49
  • Thank you! It looks pretty good but seems to be combining model and body. Also, there are instances in the data where `LHD,` or `RHD,` isn't at the beginning and I am a bit confused by the meaning of those entries too. To me, it means all after are LHD or RHD are those to which it relates. A caveat are entries like RHD, ALL 22ND as not all 22ND were available that way! It means I will have to somehow mark in a new column in parts_modelno some designation for those that are. Anyway, I'll post more examples below, – DonP Sep 15 '21 at 18:01
  • `2401 (BODY 2469); 2531 (BODY 2599); 2631 (BODIES 2678-79); 5431 (BODIES 5478-79)` `LHD, 2401-02-13; 2501-02-13-31; 2601-02-11-13-26-31-33; 5400-01-02-11-13-26-33` `ALL 22ND; 2301-02-13-33; LHD, 5401-02-13; ALL 25TH; 26TH; 54TH` `RHD, 2401; 2501; 2601 (BODY 2697)` `RHD, 2201-02-11-22-32-40; 2301` `RHD, ALL 22ND; 23RD` – DonP Sep 15 '21 at 18:01
  • In case it helps, I created a page that list the content of the tables in question. https://packardimperial.com/parsesamples.php – DonP Sep 16 '21 at 22:44
  • I end up starting from zero and trying to figure out all the possible variations then after performing 3 layers of sub-query with a bunch of `CASE` expression followed with other functions, this is what I come up with https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=1bcb50baf9a3574fdacbc3ad12cc7fce ; which in my opinion is not easy to manage. I've shrunk the data to a selected few to make a smaller examples. – FanoFN Sep 18 '21 at 02:07
  • Thank you! I ran it against the full database and it it runs quickly, giving some very good results but not entirely as you probably supposed. This `2631 (BODY 2678); 5431 (BODY 5478)`, for instance, was good but this `ALL 24TH; 25TH; 26TH` put all the entries into the wrong column. Once the next part number is found, it switches back to the proper column. These were in the first few rows of data so I'm still looking at it to see how it went as it got through it all. – DonP Sep 18 '21 at 22:37
  • I also just found that some of the Description column entries in parts_listing have `NOT USED WITH BODY 2678` or `RIGHT (NOT USED WITH BODIES 2678; 5478)` (for example) which is not a column we were checking but that can probably be ignored unless there is a way to get it to leave out the bodies that are so described. Not really a problem but thought I would mention it. Not sure if it was mentioned, but the BaseGroups marked with 1 in parts_reftype use bodyNo rather than chassisNo in parts_modelno. Hard coded, that would be `BaseGroup IN (29,30,31)`. – DonP Sep 18 '21 at 22:48
  • I see. I think, although the query suggestion I made is not a complete replacement for the multiple series of queries that you're doing but I hope you can make use of it and probably can reduce the whole processes. Personally for me, this is as far as I can go in order for me to keep it managed and adding further functions and/or sub-query might just mess it up. It's a good experiment and learning experience though so I thank you for that. – FanoFN Sep 19 '21 at 05:13
  • You're welcome and thank you for all the help! I really appreciate it. – DonP Sep 19 '21 at 06:12