5

Here is the scenario. How to utilize REGEXP to simulate IN operator yet to match all values in left side within right side regardless of the string order in either side. Also ANSI SQL solution can be achieved using left join and sub queries.

Sample table:

Parent table, Child table, Parent_Child. In order to not take more space on the question, I only post here a reulst of a Group_Concat Child query by Parent.

PID     NAME        CHILDREN    
1       dad john    dave,jill,lina
2       mum sandy   maryam,jack 
3       dad frank   henry,jill 
4       mum kate    maryam  
5       mum jean    dave

Expected Result: Select Parent who has their all children participated in something.

PID     NAME        CHILDRENREXGEX
3       dad frank   jill,henry
4       mum kate    maryam
5       mum jean    dave

Here is the REGEXP SQL solution: now the issue here, it doesn't return correct results if left side order/squence not match right side.

Query:

select 
    x.pid, x.name, x.children as childrenRexgex
from
    (select 
         p.pid, p.name, group_concat(c.name) as children
     from 
         parent as p
     inner join 
         parent_child as pc on p.pid = pc.pid
     join 
         child as c on pc.cid = c.cid
     group by 
         p.pid
     order by 
         c.name) as x
where 
    'dave,maryam,jill,henry' REGEXP x.children
;

Hence there are two aspects I would appreciate for the question:

  1. What is the best pattern to match all names in left side to the user defined list of right side regardless of the order?
  2. What could be the performance gained using REGEXP?
marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
bonCodigo
  • 14,268
  • 1
  • 48
  • 91
  • The real problem here is that the children shouldn't be a comma-separated column, but should be a detail table. See this question: [Is storing a delimited list in a database column really that bad?](http://stackoverflow.com/questions/3653462/is-storing-a-delimited-list-in-a-database-column-really-that-bad) – Andy Lester Jan 05 '13 at 07:21

1 Answers1

2

Do you want to use regexp, or another solution is fine anyway? If I understand correctly, this query should give you the correct result:

select p.pid, parent.name, group_concat(child.name)
from
  (select pid
   from 
     parent_child inner join child
     on parent_child.cid = child.cid
   group by pid
   having sum(child.name in ('dave','henry','maryam','jill'))=count(*)) p
  inner join parent on p.pid=parent.pid
  inner join parent_child on p.pid=parent_child.pid
  inner join child on parent_child.cid=child.cid
group by p.pid

that gives:

PID     NAME        CHILDRENREXGEX
3       dad frank   jill,henry
4       mum kate    maryam
5       mum jean    dave

Anyway, to use your solution, I would suggest you to use group_concat with an order:

select
  x.pid,
  x.name,
  x.children as childrenRexgex
from(
  select
    p.pid,
    p.name, 
    group_concat(c.name order by c.name) as children,
    count(c.name) as counts
  from
    parent as p inner join parent_child as pc
    on p.pid = pc.pid
    join child as c
    on pc.cid = c.cid
  group by p.pid) as x
where 'dave,henry,jill,maryam'
  REGEXP x.children

and try to match the names already ordered. This is identical to your query, i only added an order by c.name inside group_concat, and i also ordered the string in the where condition.

Edit: If you really want to use a REGEXP, since MySql support to regular expressions is limited, i would suggest you to try to use LIB_MYSQLUDF_PREG. This is a general solution that doesn't work with standard MySql regular expressions.

If you need to match a string like this:

One,Two,Three,Four

with, for example, this:

Two,Four,Three,One

you have to use a regexp like this one:

"One,Two,Three,Four" REGEXP
"^(?=.*\bTwo\b)(?=.*\bFour\b)(?=.*\bThree\b)(?=.*\bOne\b)"

(check this question) And this is what it does:

  1. \bTwo\b matches full word Two, can be: Two Two, ,Two ,Two,
  2. .* word Two can be found anywhere in the string .*\bTwo\b
  3. (?=.*\bTwo\b) match full word Two, anywhere in the string, but forget the location and start to mach the next term from the beginning
  4. start to match other words

Still missing something? Yes, because if we have a match for "One,Two,Three,Four", also "One,Two,Three,Four,Five" will match. Maybe there's a better regexp for this, but my idea is: if they match, and have the same length, they have to be identical except for the order. So we can add this at the end of our regexp:

  1. .{length}$ remember that after all previous matches, we are still at the beginning, and ^.{length}$ matches a string of given length

So final code would be:

field1="One,Two,Three,Four"
field2="Two,Four,Three,One"

field1 REGEXP CONCAT("^(?=.*\b", 
                     REPLACE(field2, ",", "\b)(?=.*\b"),
                     "\b).{", LENGTH(field1), "}$")

Notice that this regexp is not supported by REGEXP, it should be supported on LIB_MYSQLUDF_PREG but I still haven't tested it. I will let you know. There might be other solutions, but I don't think it's possible to do any better with just REGEXP.

Community
  • 1
  • 1
fthiella
  • 48,073
  • 15
  • 90
  • 106
  • Thanks for the effort. I already have two queries that gives me the answer without `regexp`. Infact one is exactly the same as your first query. So the main reason I post the question to get the answer, solution based on `regexp` :-) perhaps I need to further stress that on my question. I am more curious to know the performance between two different methods. – bonCodigo Dec 21 '12 at 11:45
  • @bonCodigo i was not sure this was was you need... the idea of using just regexp is nice.. i don't have an answer yet, but i'll think about it :) +1 for the question – fthiella Dec 21 '12 at 11:49
  • much appreciated any effective infor towards `regexp` solution :) I infact have already checked one of the previous posts where you had answered. [Reference](http://stackoverflow.com/questions/10480568/comma-separated-values-in-mysql-in-clause/13445491#13445491) – bonCodigo Dec 21 '12 at 13:07
  • I realized that I have forgotten to upvote you :) +1 @fthiella – bonCodigo Jan 03 '13 at 19:57
  • @bonCodigo thanks :) I didn't forget about this question, there's a regexp supposed to match unordered words, but mysql doesn't support all kinds of regexp, at least not natively, and this seems not to work. I'll let you know if I have any news! – fthiella Jan 03 '13 at 20:19
  • @bonCodigo updated the answer, it doesn't work with standard MySql, you have do install an UDF function, I still haven't tested the code with it, I only tested it in perl – fthiella Jan 05 '13 at 00:53