1

I am working on a project that uses PHP/MYSQL as the backend for an IOS app that makes a lot of use of dictionaries and arrays containing text or strings.

I need to store this text in MYSQL (coming from Arrays of srtrings on phone) and then query to see the text contains (case insensitive) a word or phrase in question.

For example, if the array consists of {Ford, Chevy, Toyota, BMW, Buick}, I might want to query it to see it contains Saab.

I know storing arrays in a field is not MYSQL friendly as it prevents optimization. However, it would be way too complicated to create individual tables for these collections of words which are created by users.

So I'm looking for a reasonable way to store them, perhaps delimited with spaces or with commas that makes possible reasonably efficient searches.

If they are stored separated by spaces, I gather you can do something with regex like:

SELECT 
    *
FROM
    `wordgroups`
WHERE
    wordgroup regexp '(^|[[:space:]])BLA([[:space:]]|$)';

But this seems funky.

Is there a better way to do this? Thanks for any insights

zztop
  • 701
  • 1
  • 7
  • 20

2 Answers2

1

Consider using a FULLTEXT index. And use MATCH(...) AGAINST(... IN NATURAL LANGUAGE MODE).

FULLTEXT is very fast for "words", and IN NATURAL MODE may solve your Saab example.

Rick James
  • 135,179
  • 13
  • 127
  • 222
0

Using regexp can achieve what you want, however, your query will be inefficient, since it cannot rely on any indexes.

If you want to store a list of words and their position within the array does not matter, then you may consider storing them in a single field, space delimited. But instead of using a regexp, use fulltext indexing and searching. This method has a clear advantage over searching with regexp: it uses an index. It has some drawbacks as well: there is a stopword list (these are excluded from searching) and there is a minimum word length as well. The good news is that these parameters are configurable. Also, you get all the drawbacks of storing data in a delimited field, as detailed in Is storing a delimited list in a database column really that bad? question here on SO.

However, if you want to use dictionaries (key - value pairs) or the position within the list may be important, then the above data structure will not do.

In this case, I would consider if mysql is the right choice for storing my data in the first place. If you have multi-dimensional lists, or lists containing lists, then I would definitely choose a different nosql solution.

If you only need simple, two-dimensional lists / dictionaries, then you can store all of them in a single table with a similar structure as below:

list_id - unique identifier of the list, primary key
user_id - id of the user the list belongs to
key - for dictionaries this is the lookup field (indexed), for other lists it may store the position of the element. String data type.
value - the field holding the value (indexed). Data type should be string, so that it could hold different data types as well.

A search to determine if a list holds a certain value would be fast and efficient lookup using the index on either the key or value fields.

Shadow
  • 33,525
  • 10
  • 51
  • 64
  • I realized that some of these words will be phrases so I think I would have to use comma delimited instead of space... for example, GMC Truck or Range Rover. I gather the downside for using comma delimited is about the same as space delimited? – zztop Jun 03 '19 at 14:36
  • Yes, the separator character does not really matter. – Shadow Jun 03 '19 at 14:51
  • Thanks, Shadow. Order is important ideally and I think you are right about going with NOSQL solution but host does not support one currently. I upvoted your answer. – zztop Jun 03 '19 at 15:17
  • If order is important, then fulltext indexes will not help. In this case why did you accept an answer as a solution that only suggested fulltext indexes to be used? Also, if the hosting company does not support the technologies you need, then you better switch hosting providers. – Shadow Jun 03 '19 at 15:22
  • Large legacy project so not feasible to switch. I wish I could accept both answers. – zztop Jun 03 '19 at 19:47