Matching row values obtained from first table to the column names of second table without loop

Question

It will be really helpful for me if someone can solve the following problem. I have two tables and a requirement-

Required SQL script:

Without using loops, an efficient SQL script to match row values (sampleID) obtained from first table to the column names (X2, X4, X8...) Of second table and obtain the values for given Gene_ID (e.g. NFYA).

Expected result:

X1  15.2856
X10 18.2201
X14 13.3406
. .
. .

Tableone (rows X columns: 135 X 32)

Table description (partial rows and columns)

+-------+--------------------+--------+----------+-------+---------+
| Batch | filename_generate  | pcode  | SampleID | check | Diagnos |
+-------+--------------------+--------+----------+-------+---------+
| B     | cufflinks_out_2_B  | 01-111 | X2       | TRUE  | RH      |
| D     | cufflinks_out_4D   | 01-163 | X4       | TRUE  | RH      |
| B     | cufflinks_out_5_B  | 01-166 | X5       | TRUE  | RH      |
| D     | cufflinks_out_6D   | 02-007 | X6       | TRUE  | RH      |
| C     | cufflinks_out_8C   | 02-012 | X8       | TRUE  | RH      |
| C     | cufflinks_out_9C   | 02-014 | X9       | TRUE  | RH      |
| B     | cufflinks_out_10_B | 02-017 | X10      | TRUE  | RH      |
| B     | cufflinks_out_13_B | 02-030 | X13      | TRUE  | ON     |
| D     | cufflinks_out_14D  | 02-031 | X14      | TRUE  | RH      |
| B     | cufflinks_out_15B  | 02-037 | X15      | TRUE  | RH      |
| C     | cufflinks_out_16C  | 02-038 | X16      | TRUE  | IS     |
| B     | cufflinks_out_17_B | 02-041 | X17      | TRUE  | ON     |
| B     | cufflinks_out_19_B | 02-050 | X19      | TRUE  | ON     |
| B     | cufflinks_out_20_B | 02-056 | X20      | TRUE  | RH      |
+-------+--------------------+--------+----------+-------+---------+

Tabletwo (rows X columns: 56000 X 137)

Table description (partial rows and columns)

+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+
|   Ensembl_ID    | Gene_ID  |   X1    |   X10   |   X13    |   X14    |   X15    |   X16    |   X17   |
+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+
| ENSG00000000003 | TSPAN6   | 1.388   | 0.443   | 0.563    | 0.350    | 0.390    | 0.220    | 0.528   | 
| ENSG00000000005 | TNMD     | 0       | 0       | 0        | 0        | 0        | 0        | 0       |
| ENSG00000000419 | DPM1     | 34.309  | 40.2635 | 28.8669  | 21.0556  | 18.1733  | 22.0223  | 25.4352 |
| ENSG00000000457 | SCYL3    | 7.84987 | 9.35551 | 7.45483  | 7.1601   | 6.53686  | 7.26445  | 6.30419 |
| ENSG00000000460 | C1orf112 | 2.36851 | 3.76825 | 3.10324  | 2.0262   | 1.84606  | 3.01185  | 3.02763 |
| ENSG00000000938 | FGR      | 227.024 | 222.578 | 247.124  | 234.995  | 255.226  | 265.288  | 323.6   |
| ENSG00000000971 | CFH      | 3.17952 | 3.60279 | 1.68429  | 3.74301  | 2.10637  | 0.763278 | 1.78278 |
| ENSG00000001036 | FUCA2    | 16.5566 | 19.1703 | 25.5005  | 18.5244  | 20.7771  | 18.353   | 25.2364 |
| ENSG00000001084 | GCLC     | 9.45121 | 16.1362 | 12.6239  | 13.1074  | 10.6472  | 18.9938  | 12.8249 |
| ENSG00000001167 | NFYA     | 15.2856 | 18.2201 | 12.4789  | 13.3406  | 15.0146  | 13.2608  | 11.5385 |
| ENSG00000001460 | C1orf201 | 1.64558 | 1.93322 | 0.7927   | 1.71796  | 2.27997  | 0.938738 | 1.3911  |
| ENSG00000001461 | NIPAL3   | 14.6073 | 14.1772 | 11.1503  | 12.5077  | 15.1269  | 13.6     | 11.227  |
| ENSG00000001497 | LAS1L    | 14.4519 | 15.3965 | 11.8901  | 16.8572  | 16.7174  | 14.6004  | 15.7266 |
+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+

Script written in Python, module used SQLite3

I am adding the full code.

import os, sys, time
import sqlite3   
import apsw  
disk_db = apsw.Connection('sampleinfogenotype.db')  
memcon=apsw.Connection(":memory:")  
with memcon.backup("main",disk_db, "main") as backup:  
    backup.step() # copy whole database in one go  

mdata=memcon.cursor()  
for row in memcon.cursor().execute("SELECT tableone.SampleID from tableone WHERE   tableone.Diagnos=='RH'"):
    sampleID_row=str(row[0])  
    sqlscript="SELECT "+sampleID_row+ " FROM tabletwo WHERE tabletwo.Gene_ID=='NFYA'"  
    data=memdata.execute(sqlscript).fetchall()[0]  
    print sampleID_row,data[0]  
memcon.close()  
disk_db.close()

It gives the expected result but it is time consuming, since there is a loop! Is there an efficient SQL script or method... any information will be helpful.

Martijn Pieters · Accepted Answer · 2012-08-31T17:28:09.877

1

You don't need to re-query the second table over and over again, you could just retrieve the one row once and use that as a lookup-table. If you use the sqlite3.Row row factory that is as easy as retrieving one row:

memcon.row_factory = sqlite3.Row
memdata.execute("select * from tabletwo where tabletwo.Gene_ID == 'NFYA'")
nfya_row = memdata.fetchone()

memdata.execute("SELECT tableone.SampleID from tableone WHERE tableone.Diagnos=='RH'")
for row in memdata:
    sampleID = row['SampleID']
    print sampleID, nfya_row[sampleID]

The nfya_row value now holds one sqlite3.Row object, whose keys are the column names. When now looping over the tabelone results, you can look up the value of any given column directly.

If you are using the apsw module instead, then there is no easy ready-made row factory that returns a dictionary, but you can easily build the dictionary yourself for one row:

memdata.execute("select * from tabletwo where tabletwo.Gene_ID == 'NFYA'")
nfya_row = {desc[0]: column for desc, column in zip(memdata.getdescription(), memdata.next())}

memdata.execute("SELECT tableone.SampleID from tableone WHERE tableone.Diagnos=='RH'")
for row in memdata:
    sampleID = row[0]
    print sampleID, nfya_row[sampleID]

edited Aug 31 '12 at 17:28

answered Aug 31 '12 at 15:11

Martijn Pieters

1,048,767
296
4,058
3,343

>>Thanks Martijn, for the quick reply...Since I use apsw module to call in-memory, I am getting an error as following memcon.row_factory = sqlite3.Row AttributeError: 'apsw.Connection' object has no attribute 'row_factory' I have edited my question and added the full code...It could be nice if you can help me here..thanks in advance ! – TJC Aug 31 '12 at 16:08
@TJC: Since APSW handles row factories differently and doesn't include a handy alternative `Row` class like sqlite3 does, I've updated my answer to include a APSW alternative. – Martijn Pieters Aug 31 '12 at 16:20
APSW is still a problem.......AttributeError: 'apsw.Cursor' object has no attribute 'fetchone' – TJC Aug 31 '12 at 16:33
@TJC: sorry about that; replaced `fetchone()` with `next()` instead. I've never used APSW myself. – Martijn Pieters Aug 31 '12 at 16:38
Thanks @Martijn Pieters..You saved my day..I am a beginner in sql..Now it is working as expected..I made a minor correction in the code ' sampleID = row['SampleID'] to sampleID = row[0] Since tuple indices must be integers. – TJC Aug 31 '12 at 17:24
@TJC: sorry, yes, that's a holdover from the row factory in sqlite3. – Martijn Pieters Aug 31 '12 at 17:28
You can emulate row factories by using the row tracer in APSW (see the documentation). This lets you return anything for the row, or skip it completely. – Roger Binns Nov 23 '12 at 07:01

Matching row values obtained from first table to the column names of second table without loop

1 Answers1