I've got ID# that are numeric strings with at least 4 digits. They are supposed to be in the ID column, but sometimes they are buried in different spots within free text in a separate Comments column, with 0s or blanks in the ID column. There are also 4-digit years in the Comments, which might trip things up:
Name | ID | Comments |
---|---|---|
Room A | 48625 | |
Room B | 8056 | |
Room C | 887654 | |
Room C | 0 | code is 887012 according to staff in 2021 |
Cellar | ID#5467 | |
Cellar | 00000 | IDcode#5467 - entered by RS 5/1/2021 |
Room D | unknown | updated 2022 |
Hoping to get something like this, where the non-IDs in the ID column are ignored, the possible IDs are extracted from the Comments, and a column added saying whether the ID found in the ID column or mentioned as "ID" or "Code" in the Comments column, to confirm that it's not a year:
Name | ID | MentionedAs |
---|---|---|
Room A | 48625 | ID column |
Room B | 8056 | ID column |
Room C | 887654 | ID column |
Room C | 887012 | code |
Cellar | 5467 | ID |
The beginner basics I have so far is:
select distinct
NAME, ID, coalesce(ID, COMMENTS) as ID,
(case when ID is not null then 'ID column'
when Comments like 'ID%' then 'ID'
when Comments like '%code%' then 'Code'
else null
end) as MentionedAs
from TABLE
WHERE
(not ID='00000' and ID like '%[0-9][0-9][0-9][0-9]%') or
(Comments like '%[^/^0-9][0-9][0-9][0-9]%' and Comments like 'ID%' or
Comments like '%code%)
but this is incomplete and I'm unsure how to pull just the possible IDs strings out of the Comments, or account for all the IDs that might be wrong in the future (ex., someone entering ten 0s), etc.
I found other solutions where functions are used to extract strings, but I'm not allowed to create functions.