0

I would like to create a variable which stores multiple strings, but nothing I have tried seems to work. I have a block of code (below) which works if I explicitly write out the three strings, but not when I pass it a variable (I am calling it @a) which I think should store the same exact strings:

Here is the block of code in question for reference, this does what I want when I write out the 3 strings ('ENSG00000120860','ENSG00000197894','ENSG00000273425'):

SELECT 
ensp2, COUNT(DISTINCT (ensembl_gene_id))
FROM
(SELECT 
    s . *, g.ensembl_gene_id, g.associated_gene_name
FROM
    blang.string s
JOIN natashal.ensp_to_ensg_and_gene_symbol g ON g.ensembl_protein_id = s.ensp1
ORDER BY ensembl_gene_id) x
WHERE
ensembl_gene_id IN ('ENSG00000120860','ENSG00000197894','ENSG00000273425')
GROUP BY ensp2
ORDER BY COUNT(DISTINCT (ensembl_gene_id)) DESC

But, I would like to set ('ENSG00000120860','ENSG00000197894','ENSG00000273425') equal to a variable since these are going to change. I don't type these by hand, but rather list them from a column of results I get. I tried to do this:

set @a = (SELECT GROUP_CONCAT(QUOTE(ensembl_gene_id) SEPARATOR ',') 
FROM (SELECT DISTINCT(ensembl_gene_id) FROM natashal.ensp_to_ensg_and_gene_symbol ORDER BY RAND() LIMIT 8) x)

And the difficulty is that @a LOOKS like it should store my 3 strings well, but fails to do so:

SELECT @a
>'ENSG00000120860','ENSG00000197894','ENSG00000273425'

And I can even put parentheses around it like this:

SELECT CONCAT('(',@a,')')
>('ENSG00000120860','ENSG00000197894','ENSG00000273425')

Which looks like it works, but then when I try to pass this reference to my block of code above, like this:

SELECT 
ensp2, COUNT(DISTINCT (ensembl_gene_id))
FROM
(SELECT 
    s . *, g.ensembl_gene_id, g.associated_gene_name
FROM
    blang.string s
JOIN natashal.ensp_to_ensg_and_gene_symbol g ON g.ensembl_protein_id = s.ensp1
ORDER BY ensembl_gene_id) x
WHERE
ensembl_gene_id IN (SELECT @a)
GROUP BY ensp2
ORDER BY COUNT(DISTINCT (ensembl_gene_id)) DESC

I don't get the results I want, nothing is returned (and no error message is returned either). MySQL seems to think it's an empty array, or maybe somehow it can't match the strings to values in the table?

Thanks in advance, any help would be appreciated.

Marc B
  • 356,200
  • 43
  • 426
  • 500

2 Answers2

1

Use simply the result of the subquery instead:

instead of

WHERE
ensembl_gene_id IN ('ENSG00000120860','ENSG00000197894','ENSG00000273425')

take:

WHERE
ensembl_gene_id IN (
    SELECT x.id FROM (
        SELECT DISTINCT(ensembl_gene_id) id 
        FROM natashal.ensp_to_ensg_and_gene_symbol 
        ORDER BY RAND() LIMIT 8
    ) x
);

To overcome the limitation that you can't use a LIMIT clause in a subselect you've got to put your subselect into another one.

Have a look at the principle in this fiddle.

VMai
  • 10,156
  • 9
  • 25
  • 34
  • Thanks for your response. Yes, I've come across the limitation of not being allowed row limits in sub-clauses, but I'm unclear on what you meant by putting the subselect into another one? From another answer to this question it looks like simple table joining might solve this – Natasha L Aug 19 '14 at 16:41
0

The problem with

WHERE
ensembl_gene_id IN (SELECT @a)

is that the select @a bit's results will be treated as a single monolithic string. So given your

SELECT @a
'ENSG00000120860','ENSG00000197894','ENSG00000273425'

result, your main query would actually be the functional equivalent of

WHERE
ensembl_gene_id IN ('\'ENSG00000120860\',\'ENSG00000197894\',\'ENSG00000273425\'')

If you want to use the @a value as actual separate values, you'd need to use a stored procedure, build your main query as a string, then exec it, along the lines of this code: https://stackoverflow.com/a/5728155/118068.

The better solution would be to simply embed the query that produced @a's value in your main query, but since you're using a LIMIT, that's not possible. YOu could investigate using a JOIN operation instead, however.

Community
  • 1
  • 1
Marc B
  • 356,200
  • 43
  • 426
  • 500
  • Thanks Marc, this is helpful. I hadn't used random number generation before and blanked on the fact that I could still join other tables onto random rows in the normal way. I agree that the stored procedure route might be excessive here – Natasha L Aug 19 '14 at 16:39