0

I have 2 mysql tables -

candskill - (cis, sid) - where cid = candidate id, and sid = skill id

Data in candskill (size - 257,000) -

c1, s1
c1, s2
c2, s3
c1, s4
c2, s5
...

skills - (sid, name) - where sid = skill id, and name = skill name

Data in skills (size 257,000)-

s1 - oracle
s2 - project management
s3 - oracle
s4 - testing
s5 - testing
...

Now, I want to fetch all the candidates who have skills 'oracle' and 'testing' both. Or I want candidate who have skills either 'oracle' or 'testing'. I want to have any AND/OR combination of skills present, and want to fetch candidates for those skills.

How would I achieve that?

This is what I have so far, which is not working in all scenarios.

select distinct(cs.cid), s.name from candskill cs 
inner join skills s on (cs.sid = s.sid and (s.name = 'oracle' or s.name = 'testing'))

Also, the query execution is taking too much time. approx 120 sec. How do we go about doing that.

I am thinking of writing a query, and passing the skill part of the query via php code, concate the strings, and generate new query each time a user searches for candidates for a particular skill.

Juan Carlos Oropeza
  • 47,252
  • 12
  • 78
  • 118
zookastos
  • 917
  • 10
  • 37

2 Answers2

1

You could use an having clause on count of s.name

  select cs.cid
  from candskill cs 
  inner join skills s on (cs.sid = s.sid and  s.name in (  'oracle' , 'testing'))
  group by cs.cid
  having count(distinct(s.name)) = 2

for 1 or 2

  select cs.cid
  from candskill cs 
  inner join skills s on (cs.sid = s.sid and  s.name in (  'oracle' , 'testing'))
  group by cs.cid
  having count(distinct(s.name)) >= 1
ScaisEdge
  • 131,976
  • 10
  • 91
  • 107
  • 1
    This is for candidates having both skills, if you want `OR+BOTH` you need `having count(distinct(s.name)) >= 1` – Juan Carlos Oropeza Feb 09 '17 at 19:51
  • Well @scais Edge - I executed your query and it took around 64 sec to complete. – zookastos Feb 09 '17 at 19:54
  • YOu have asket for the result not for performance ..,then if my answer is right ..mean that response to yiu question .. (i don't know how many rows you have .. if you have index or not .. and so on .. ) – ScaisEdge Feb 09 '17 at 19:55
  • I am looking for both OR , AND combination. My user will want candidates for 1. oracle and testing both, 2. either oracle or testing. So based on that I will have to generate query based on what user is searching for. So, I want a solution where all the things can be handled via minimum changes in query string. – zookastos Feb 09 '17 at 19:56
  • Yeah yeah @scais Edge. I agree, your solution works. I wasn't criticizing. Was just letting you know that its better than 124 sec which was my query. – zookastos Feb 09 '17 at 19:57
  • @NalinAgrawal answer udpated for 1 and 2 too as suggest by JuanCarlosOropeza – ScaisEdge Feb 09 '17 at 19:58
  • Around more than 250,000 in my local instance. Production will have more than 500,000 and growing. – zookastos Feb 09 '17 at 20:01
  • you have index on skill (name, id) (in reverse order .. ? could an index of this type improve your performance – ScaisEdge Feb 09 '17 at 20:02
  • Just checked. Unfortunately no. – zookastos Feb 09 '17 at 20:03
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/135325/discussion-between-nalin-agrawal-and-scaisedge). – zookastos Feb 09 '17 at 20:06
1

Maybe reducing the set of skills helps with the performance, e.g.

select cs.cid
from (select sid from skills where name in ('oracle', 'testing')) s
join candskills cs on cs.sid = s.sid

Instead of joining 250,000 x 250,000 rows, this will join 2 x 250,000 rows.


Furthermore, adding an index on skills.name and another on skills.sid and candskills.sid might improve the query further.

Olaf Dietsche
  • 72,253
  • 8
  • 102
  • 198