0

I have two tables. I try to select some records from one of them. Then, ID's part of that selection should be used to select some records of the other table. For that, I wrote a statement which takes long time for execution. I even couldn't see the results. Also, that broke my phpmyadmin from localhost.

Here is the code:

SELECT * FROM uniquestructures as uns WHERE uns.ProteinID IN (SELECT unp.* FROM uniqueproteins as unp HAVING LENGTH(unp.PDBASequence) < 20) as T)

To make it clear, first it selects records with all columns which have sequence length less than 20. Later, according to IDs of the selected records, I am searching the records have the same ID (as ProteinID)

Thank you very much for your help

nenana
  • 57
  • 1
  • 10
  • firstly - you might want to read up on JOIN syntax. Secondly, how many rows are in uniquestructures and in uniqueproteins? Also - try replacing unp.* with (guess) unp.ProteinID – Tom Mac Oct 12 '11 at 11:03
  • ah. Just twigged. Is the uns.ProteinId comparable to a concatenation of all of the columns in the uniqueproteins table? – Tom Mac Oct 12 '11 at 11:31
  • I used also LEFT JOIN. Uniquestructures and uniqueproteins have the same size and same ID information--20370 rows. But the rows have sequence length less than 20 are 2228. So, I need to select each id in that 2228 and return the row in uniquestructures which has that id. – nenana Oct 12 '11 at 11:33

1 Answers1

0

I reckon you need to use an INNER JOIN with a DISTINCT here:

SELECT distinct uns.*
FROM uniquestructures as uns 
INNER JOIN uniqueproteins as unp on uns.ProteinID = unp.ProteinId
where LENGTH(unp.PDBASequence) < 20;

Also, you might have some joy if you create a separate column on the uniqueproteins table to hold the length of the PDBASequence column (say PDBASequenceLength). You could then put an index on the PDBASequenceLength column rather than calling LENGTH(PDBASequence) in your query. If the data is not static then create a trigger to populate the PDBASequenceLength column each time a row is inserted or updated into the uniqueproteins table. Hence:

CREATE TRIGGER uniqueproteins_length_insert_trg
AFTER INSERT ON uniqueproteins FOR EACH ROW SET NEW.PDBASequenceLength = length(new.PDBASequence);

CREATE TRIGGER uniqueproteins_length_update_trg
AFTER UPDATE ON uniqueproteins FOR EACH ROW SET NEW.PDBASequenceLength = length(new.PDBASequence);

alter table uniqueproteins add key `uniqueproteinsIdx2` (PDBASequenceLength);

Your query could then be:

SELECT uns.*
FROM uniquestructures as uns 
INNER JOIN uniqueproteins as unp on uns.ProteinID = unp.ProteinId
where unp.PDBASequenceLength < 20;

Good luck!

Tom Mac
  • 9,693
  • 3
  • 25
  • 35
  • Thank you very much, it worked. I checked my code and added distinct and worked for even left join. But I wonder, why does the code I posted not work? Just to learn :) Could you explain it? – nenana Oct 12 '11 at 11:55
  • No problems! The problem is in your sub-select i.e. ...(select unp.*... this needs to be changed to ...(select unp.ProteinId.... I suggested using an inner join since this is usually a better approach than nesting sub-selects. – Tom Mac Oct 12 '11 at 11:58
  • PS If it worked then you may want to indicate this by accepting the answer :) – Tom Mac Oct 12 '11 at 12:07