If you are using MySQL Version 8+ then below is the built-in function that might help you better.
MySQL Query:
SELECT REGEXP_REPLACE('`w"w\'w. ex%a&m:p l–e.c)o(m`', '[("\'%[:blank:]&:–)]', '');
Almost for all bugging characters-
SELECT REGEXP_REPLACE(column, '[\("\'%[[:blank:]]&:–,#$@!;\\[\\]\)<>\?\*\^]+','')
Real-life scenario.
I had to update all the files name which has been saved in 'demo' with special characters.
SELECT * FROM demo;
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers upload winners .png |
| private://webform/applicant_details/129/student : class & teacher data.pdf |
| private://webform/applicant_details/130/tax---user's---data__upload.pdf |
| private://webform/applicant_details/130/Applicant Details _ report_0_2.pdf |
| private://webform/applicant_details/131/india&asia%population huge.pdf |
Test Case -
The table has multiple rows with special characters in the file name.
Advice:
To remove all the special characters from the file name and use a-z, A-Z, 0-9, dot and underscore with a lower file name.
Expected result is:
| uri |
|------------------------------------------------------------------------------|
| private://webform/applicant_details/129/offers_upload_winners_.png |
| private://webform/applicant_details/129/student_class_teacher_data.pdf |
| private://webform/applicant_details/130/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/130/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/131/india_asia_population_huge.pdf |
Okay, let's plan step by step
1st - let's find the file name
2nd - run all the find replace on that file name part only
3rd - replace the new file name with an old one
How can we do this?
Let's break down the whole action in chunks for a better understanding.
Below function will extract the file name only from the full path e.g. "Applicant Details _ report_0_2.pdf"
SELECT -- MySQL SELECT statement
SUBSTRING_INDEX -- MySQL built-in function
( -- Function start Parentheses
uri, -- my table column
'/', -- delimiter (the last / in full path; left to right ->)
-1 -- start from the last and find the 1st one (from right to left <-)
) -- Function end Parentheses
from -- MySQL FROM statement
demo; -- My table name
#1 Query result
| uri |
|------------------------------------|
| offers upload winners .png |
| student : class & teacher data.pdf |
| tax---user's---data__upload.pdf |
| Applicant Details _ report_0_2.pdf |
| india&asia%population huge.pdf |
Now we have to find and replace within the generated file name result.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]+', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#2 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population_huge.pdf |
FYI - Last '+' in the pattern is for repetitive words like ---- or multiple spaces ' ', Notice the result without '+' in the below regex pattern.
SELECT
REGEXP_REPLACE( -- MySQL REGEXP_REPLACE built-in function (string, pattern, replace)
SUBSTRING_INDEX(uri, '/', -1), -- File name only
'[^a-zA-Z0-9_.]', -- Find everything which is not a-z, A-Z, 0-9, . or _.
'_' -- Replace with _
) AS uri -- Give a alias column name for whole result
from
demo;
#3 Query result
| uri |
|------------------------------------|
| offers___upload__winners_.png |
| student___class___teacher_data.pdf |
| tax___user_s___data__upload.pdf |
| Applicant_Details___report_0_2.pdf |
| india_asia_population__huge.pdf |
Now, we have a file name without special characters (. and _ allowed). But the problem is file name still has Capital letters and also has multiple underscores.
Let's lower the file name first.
SELECT
LOWER(
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]',
'_'
)
) AS uri
from
demo;
#4 Query result
| uri |
|------------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data__upload.pdf |
| applicant_details___report_0_2.pdf |
| india_asia_population_huge.pdf |
Now everything is in lower case but underscores are still there. So we will wrap the whole REGEX.. with one more REGEX..
SELECT
LOWER(
REGEXP_REPLACE( -- this wrapper will solve the multiple underscores issue
REGEXP_REPLACE(
SUBSTRING_INDEX(uri, '/', -1),
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+', -- if 1st regex action has multiple __ then find it
'_' -- and replace them with single _
)
) AS uri
from
demo;
#5 Query result
| uri |
|----------------------------------|
| offers_upload_winners_.png |
| student_class_teacher_data.pdf |
| tax_user_s_data_upload.pdf |
| applicant_details_report_0_2.pdf |
| india_asia_population_huge.pdf |
Congratulations! we have found what we were looking for. Now UPDATE TIME! Yeah!!
UPDATE -- run a MySQL UPDATE statement
demo -- tell MySQL to which table you want to update
SET -- put SET statement to set the updated values in desire column
uri = REPLACE( -- tell MySQL to which column you want to update,
-- I am also putting REPLACE function to replace existing values with new one
-- REPLACE (string, replace, with-this)
uri, -- my column to replace
SUBSTRING_INDEX(uri, '/', -1), -- my file name part "Applicant Details _ report_0_2.pdf"
-- without doing any action
LOWER( -- "applicant_details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details_report_0_2.pdf"
REGEXP_REPLACE( -- "Applicant_Details___report_0_2.pdf"
SUBSTRING_INDEX(uri, '/', -1), -- "Applicant Details _ report_0_2.pdf"
'[^a-zA-Z0-9_.]+',
'_'
),
'[_]+',
'_'
)
)
);
And after and UPDATE Query, result would be like this.
| uri |
|--------------------------------------------------------------------------|
| private://webform/applicant_details/152/offers_upload_winners_.png |
| private://webform/applicant_details/153/student_class_teacher_data.pdf |
| private://webform/applicant_details/153/tax_user_s_data_upload.pdf |
| private://webform/applicant_details/154/applicant_details_report_0_2.pdf |
| private://webform/applicant_details/154/india_asia_population_huge.pdf |
Sample data script
DROP TABLE IF EXISTS `demo`;
CREATE TABLE `demo` (
`uri` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin NOT NULL DEFAULT '' COMMENT 'The S3 URI of the file.',
`filesize` bigint unsigned NOT NULL DEFAULT '0' COMMENT 'The size of the file in bytes.',
`timestamp` int unsigned NOT NULL DEFAULT '0' COMMENT 'UNIX timestamp for when the file was added.',
`dir` int NOT NULL DEFAULT '0' COMMENT 'Boolean indicating whether or not this object is a directory.',
`version` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8_bin DEFAULT '' COMMENT 'The S3 VersionId of the object.'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
INSERT INTO `demo` (`uri`, `filesize`, `timestamp`, `dir`, `version`) VALUES
('private://webform/applicant_details/152/offers upload winners .png', 14976905, 1658397516, 0, ''),
('private://webform/applicant_details/153/student : class & teacher data.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/153/tax---user\'s---data__upload.pdf', 98449, 1658397516, 0, ''),
('private://webform/applicant_details/154/Applicant Details _ report_0_2.pdf', 0, 1659525447, 1, ''),
('private://webform/applicant_details/154/india&asia%population huge.pdf', 13301, 1658397517, 0, '');
Big Thanks:
MySQL: SELECT, UPDATE, REPLACE, SUBSTRING_INDEX, LOWER, REGEXP_REPLACE
MySQL Query Formatter: Thanks to CodeBeautify for such an awesome tool.