MySQL - Select substring from a column, without catching similar substrings from same column

Question

In a MySQL table I have a VARCHAR column called ShareID.

If the ShareID value for Row #1 contains a string in the form of 1
and the ShareID value for Row #2 contains a string in the form of 10, 1
and the ShareID value for Row #3 contains a string in the form of 111, 12.

I would like to grab all the rows where the ShareID is 1. i.e. ONLY the first and second rows here.

I have tried using the LIKE command, like so:

SELECT * FROM tablename WHERE ShareWithID LIKE '1%';

but this will catch ALL the rows that contain the number 1 in it, i.e. Row #3 which is not what I want.

I would like to run a command that would ONLY return rows #1 and #2 above because they have a ShareID of 1 contained within it.

I've tried a variety of commands, (including REGEXP, and IN) and managed a 'frig' solution where I'd place a comma after EVERY number in the ShareID column, including the last one (i.e. 10, 1,), and then execute this command:

SELECT * FROM tablename WHERE ShareWithID LIKE '%1,%';

But I would rather use a proper solution over a frigged solution.

Any guidance would be most welcome.

Gordon Linoff · Answer 1 · 2019-02-02T16:15:31.293

2

You should not be storing lists of numbers in a comma-delimited string. This is a really bad idea:

Number should be stored as numbers, not strings.
Your numbers appear to be ids. Ids should have explicit foreign keys defined.
SQL -- in general -- has lousy string handling functions.
SQL cannot optimize the queries with string operations.
SQL has a great way of storing lists. It is called a table.

Sometimes, though, we are stuck with other peoples really, really, really, really bad decisions on designing databases. MySQL has a convenient function for this situation:

where find_in_set(1, ShareWithID) > 0

If you have spaces in the string, you need to remove them:

where find_in_set(1, replace(ShareWithID, ' ', '')) > 0

edited Feb 02 '19 at 16:15

answered Feb 02 '19 at 02:37

Gordon Linoff

1,242,037
58
646
786

Yes, I'm aware of MySQL's lack of string handling abilities, but sometimes string manipulation is the better way to go. It saves having to waste time building a 250,000 row 'table' (was it??) to look after a few columns of IDs, when one column in the original 'table' could do the job just as well. So long as the correct 'built-in' MySQL command is used to manipulate the string. So thank you for leading me to the 'find_in_set' command, that I was unaware of. That's why portals like stackoverflow are so good, however, there is one fundamental detail that needs to be addressed - explained below. – Alan N Feb 02 '19 at 15:55

score 1 · Answer 2 · answered Feb 02 '19 at 17:12

1

...the built-in feature is there to be used

FIND_IN_SET() is actually not intended to be used for strings containing comma-separated lists. It's intended to be used with MySQL's SET data type. Hence the name FIND_IN_SET(), not FIND_IN_COMMA_SEPARATED_LIST().

It saves having to waste time building a 250,000 row 'table' (was it??) to look after a few columns of IDs, when one column in the original 'table' could do the job just as well.

250k rows is not a problem for MySQL. I manage databases with billions of rows in a given table. If you do basic query optimization with indexes, most queries on a table of 250k rows are just fine.

Whereas using a comma-separated list, you spoil any chance of optimizing queries. An index does not help searching for substrings that may not be the leftmost prefix of the string, and searching for a number in a comma-separated list is basically searching for a substring.

You're making your queries impossible to optimize by using a comma-separated list. Every query using FIND_IN_SET() will be a table-scan, which will get slower in a linear relationship to the number of rows in your table.

There are other disadvantages to using a comma-separated list besides indexing, which I wrote about in my answer to this old post: Is storing a delimited list in a database column really that bad?

I would rather use a proper solution over a frigged solution.

Then store one id per row. In a relational database, that's the proper solution.

answered Feb 02 '19 at 17:12

Bill Karwin

538,548
86
673
828

Thanks for the explanation Bill and the link to your old post. I do appreciate that storing numbers in a string is generally a big no-no, but in my case it works perfectly for the limited amount of I/O that I intend to use it for. I've never used Indexes with my databases, simply because I don't know how to yet. It seems that the correct way is if all 250,000 rows need associative columns for the SharedWithID values, I'll need a new 250,000 row table with the relevant amount of columns. I now know two ways to crack this nut, although one appears more professionally accepted then the other. – Alan N Feb 03 '19 at 03:02
1

You might like my presentation [How to Design Indexes, Really](https://www.slideshare.net/billkarwin/how-to-design-indexes-really), or the [video](https://www.youtube.com/watch?v=ELR7-RdU9XU). – Bill Karwin Feb 03 '19 at 03:12
Thanks for those presentation links Bill. They are in my MySQL tutorial folder. Also, I have decided to veer away from the 'string' method of storage and gone down the 'integer' route. I'll dynamically add extra columns to the main table as and when I need them. There shouldn't be more than 5 additional columns. However, if that number rises substantially, I'll create the new table. – Alan N Feb 03 '19 at 15:46
So when you need to find a specific number, you'll have to search 5 columns, because you don't know which column the number you search for is stored in. This means you still can't use an index. I'm telling you this is a **solved problem** — create a child table and store one id per row in **one column.** Then you can index that one column and search for any value efficiently. – Bill Karwin Feb 03 '19 at 16:16
I'm reminded of the quote attributed to Winston Churchill: "You can always count on Americans to do the right thing – after they’ve tried everything else." – Bill Karwin Feb 03 '19 at 16:18
Winston Churchill said many stupid things in his time. I have no intention of using indexes to solve this 'particular' problem. It's just good to know that they exist. The problem I had was counting how many records from a large table are linked to 'more' than one ShareWithID. I wanted the best solution and that's why I placed the question on this board, because I didn't know the answer! I've used the 'search multiple columns' method and have written and tested the code and placed it in a Stored Procedure. It works exactly as I want it. Problem solved. Cheers! – Alan N Feb 03 '19 at 17:33
Okay, I apologize if I'm annoying you. I hope this exchange won't make you resistant to considering normalizing your database in the future. Best wishes. – Bill Karwin Feb 03 '19 at 17:35

score 0 · Answer 3 · answered Feb 02 '19 at 15:53

0

The solution to this problem is to use Gordon Linoff's suggestion of the FIND_IN_SET command in conjunction with the correct configuration of the table column in question, like this:

SELECT * FROM tablename WHERE FIND_IN_SET('1', ShareWithID);

However, because the FIND_IN_SET command allows you to find the position of a string within a comma-separated list of strings, you MUST ensure that the contents of the column contains a comma after each item and DOES NOT contain spaces after the comma.

So this column content used in conjunction with the above command will return '0' rows: 111, 1

While this column content will return '1' row: 111,1

As will this one: 33,1

And this one: 44,1,415

answered Feb 02 '19 at 15:53

Alan N

181
2
8

Sounds like `FIND_IN_SET()` isn't a great solution after all. – Bill Karwin Feb 02 '19 at 16:32
Why's that Bill Karwin? It worked fine for me in conjunction with the correct column configuration. It may not be classic coding, but the built-in feature is there to be used. – Alan N Feb 02 '19 at 17:01
Too much to answer in a comment. I'll write an answer. – Bill Karwin Feb 02 '19 at 17:06

MySQL - Select substring from a column, without catching similar substrings from same column

3 Answers3