How to remove special characters in column with MySQL?

Question

I participate in a project and met a bad design by others,we have a table called task and each task has many users,the task table is as below:

+----------------+---------------+------+-----+---------+-------+
| Field          | Type          | Null | Key | Default | Extra |
+----------------+---------------+------+-----+---------+-------+
| id             | varchar(40)   | NO   | PRI | NULL    |       |
| name           | varchar(100)  | YES  |     | NULL    |       |
| task_users     | varchar(1000) | YES  |     | NULL    |       |

and the user is stored like aaa,bbb,ccc in task_users column,which means many user id are put into one column,I know this a very bad design but since it's an old project,I can not modify the table design.

Now I have a problem,if the user is deleted,how can I remove it from the task_users column?

the user id is generated by UUID and it's at fixed length with 32 characters,so each user id is unique,such as 40cf5f01eb2f4d2c954412f27b3bf6eb,but the problem is that the user id may appaer in any position of the task_users column,so I do not know how to remove it

aaa,40cf5f01eb2f4d2c954412f27b3bf6eb,bbb -- in center
40cf5f01eb2f4d2c954412f27b3bf6eb,aaa,bbb -- in head
aaa,bbb,40cf5f01eb2f4d2c954412f27b3bf6eb -- in end

when remove the user id,the updated result is like

   aaa,bbb

I want to know can we use one update sql to remove the specified user id and still keep the same data format?

Note:I am doing it in a MySQL stored procedure,additional variable may be helpful,but I still want to just use one sql to do it,the MySQL version is 5.0

Thanks in advance!

You could use [`REGEXP_REPLACE()`](https://dev.mysql.com/doc/refman/8.0/en/regexp.html#function_regexp-replace) function to replace specified user identificator with empty string. — Alexander, Jan 10 '19 at 04:26
@Alexander Can you add an example? I have considered using regex,but the key point is how to keep the same format,it's not just simple replace it with empty,since this will add duplicate `,` — flyingfox, Jan 10 '19 at 04:29
@Alexander it's doesn't matter,I am very painful with this bad table design — flyingfox, Jan 10 '19 at 04:56

Tim Biegeleisen · Accepted Answer · 2019-01-10T06:13:44.633

4

I think we can do this with a single query:

UPDATE yourTable
SET task_users = SUBSTRING(
    REPLACE(CONCAT(',', task_users, ','), CONCAT(',', uid, ','), ','),
    2,
    LENGTH(task_users) - LENGTH(uid) - 1)
WHERE task_users REGEXP CONCAT('[[:<:]]', uid, '[[:>:]]');

Here is a link to a demo (to be used only for testing purposes):

Demo

This answer uses a trick, by which we append commas to the start and end of the task_users string. Then, we compare a given user ID by also appending commas to its start and end. If a match is found, we replace with just a single comma. But, this leaves the replacement still with its starting and ending commas, so we remove those with a substring operation.

SQL Olympics aside, hopefully you can see by the complexity of these answers that working with CSV data in a SQL database can be a real headache. Maybe you can even use this page as evidence to your colleagues that the table design needs to change.

edited Jan 10 '19 at 06:13

answered Jan 10 '19 at 05:28

Tim Biegeleisen

502,043
27
286
360

Oh, that's clever. Similar to what you see searching HTML class attributes with XPath, ie wrapping the entire string in the delimiter. – Phil Jan 10 '19 at 05:31
1

@Phil I couldn't get anything to work with your SQLFiddle demo (typical behavior), but I did get it working with Rextester. Check the demo link if you are interested. – Tim Biegeleisen Jan 10 '19 at 05:43
1

SQLFiddle is flakey most of the time but here's your answer as a demo there ~ http://sqlfiddle.com/#!9/bd4bb/1 (with UPDATE instead of SELECT) – Phil Jan 10 '19 at 05:52
@TimBiegeleisen That's the answer that I want,thanks a lot! – flyingfox Jan 10 '19 at 08:34

Phil · Answer 2 · 2019-01-10T05:30:11.460

Try this CASE expression where uid is the argument to your stored procedure...

UPDATE `task` SET `task_users` = CASE
  -- at the start
  WHEN `task_users` LIKE CONCAT(uid, ',%')
    THEN REPLACE(`task_users`, CONCAT(uid, ','), '')
  -- at the end
  WHEN `task_users` LIKE CONCAT('%,', uid)
    THEN REPLACE(`task_users`, CONCAT(',', uid), '')
  -- in the middle
  WHEN `task_users` LIKE CONCAT('%,', uid, ',%')
    THEN REPLACE(`task_users`, CONCAT(',', uid, ','), ',')
  -- only that user
  ELSE ''
END
WHERE `task_users` LIKE CONCAT('%', uid, '%');

Demo ~ http://sqlfiddle.com/#!9/1d2baa/1

Original "four queries" answer below

-- only that user
UPDATE `task`
SET `task_users` = ''
WHERE `task_users` = uid;

-- at start
UPDATE `task`
SET `task_users` = REPLACE(`task_users`, CONCAT(uid, ','), '')
WHERE `task_users` LIKE CONCAT(uid, ',%');

-- at end
UPDATE `task`
SET `task_users` = REPLACE(`task_users`, CONCAT(',', uid), '')
WHERE `task_users` LIKE CONCAT('%,', uid);

-- in the middle
UPDATE `task`
SET `task_users` = REPLACE(`task_users`, CONCAT(',', uid, ','), ',')
WHERE `task_users` LIKE CONCAT('%,', uid, ',%');

Demo ~ http://sqlfiddle.com/#!9/8e9b9bb/1

Thanks for your answer,your solution need four separate sql,in my question,I am wonder if we can use just one sql to do it — flyingfox, Jan 10 '19 at 04:34
@lucumt, it is possible to use multiple sql queries within a [custom procedure](https://dev.mysql.com/doc/refman/8.0/en/create-procedure.html). — Alexander, Jan 10 '19 at 05:08
@lucumt I've updated my answer with a potential _single-query"_ solution (untested yet) — Phil, Jan 10 '19 at 05:09

Alexander · Answer 3 · 2019-01-10T16:07:16.750

2

You could use following expression to replace specified user identificator with empty string:

SET @userID = '40cf5f01eb2f4d2c954412f27b3bf6eb';

UPDATE `task`
SET task_users = REGEXP_REPLACE(task_users, CONCAT('(,', @userID, '|', @userID, ',?)'), '');

Or add WHERE clause to filter records for update:

UPDATE `task`
SET task_users = REGEXP_REPLACE(task_users, CONCAT('(,', @userID, '|', @userID, ',?)'), '');
WHERE task_users RLIKE CONCAT('(^|,)', @userID,'(,|$)')

Note, the REGEXP_REPLACE() function was added in the MySQL 8.0.

edited Jan 10 '19 at 16:07

answered Jan 10 '19 at 04:40

Alexander

4,420
7
27
42

Sorry my database version is **5.0** – flyingfox Jan 10 '19 at 05:02
Your `LIKE` version has a potential problem, should a `@userName` happen to also be a substring of another username. Use `REGEXP` with word boundaries instead. But suggesting a regex replacement is a generally good idea here. – Tim Biegeleisen Jan 10 '19 at 05:50
@TimBiegeleisen OP says they're all unique 32 character strings so the chance of a false positive is zero – Phil Jan 10 '19 at 05:53
1

@Phil Thanks for clarifying, though in general for someone else reading this question in the future, it might be a concern for him. – Tim Biegeleisen Jan 10 '19 at 06:06
@TimBiegeleisen sure, that makes a lot of sense and I'm sure many of the other terrible _CSV_ implementations out there don't enjoy the luxury of unique, fixed length values. My answer suffers the same issue as this one – Phil Jan 10 '19 at 06:08
Thank you, @TimBiegeleisen. You are right! I have updated the answer. – Alexander Jan 10 '19 at 07:44

How to remove special characters in column with MySQL?

3 Answers3

Demo