It will be really helpful for me if someone can solve the following problem. I have two tables and a requirement-
Required SQL script:
Without using loops, an efficient SQL script to match row values (sampleID) obtained from first table to the column names (X2, X4, X8...) Of second table and obtain the values for given Gene_ID (e.g. NFYA).
Expected result:
X1 15.2856
X10 18.2201
X14 13.3406
. .
. .
Tableone (rows X columns: 135 X 32)
Table description (partial rows and columns)
+-------+--------------------+--------+----------+-------+---------+
| Batch | filename_generate | pcode | SampleID | check | Diagnos |
+-------+--------------------+--------+----------+-------+---------+
| B | cufflinks_out_2_B | 01-111 | X2 | TRUE | RH |
| D | cufflinks_out_4D | 01-163 | X4 | TRUE | RH |
| B | cufflinks_out_5_B | 01-166 | X5 | TRUE | RH |
| D | cufflinks_out_6D | 02-007 | X6 | TRUE | RH |
| C | cufflinks_out_8C | 02-012 | X8 | TRUE | RH |
| C | cufflinks_out_9C | 02-014 | X9 | TRUE | RH |
| B | cufflinks_out_10_B | 02-017 | X10 | TRUE | RH |
| B | cufflinks_out_13_B | 02-030 | X13 | TRUE | ON |
| D | cufflinks_out_14D | 02-031 | X14 | TRUE | RH |
| B | cufflinks_out_15B | 02-037 | X15 | TRUE | RH |
| C | cufflinks_out_16C | 02-038 | X16 | TRUE | IS |
| B | cufflinks_out_17_B | 02-041 | X17 | TRUE | ON |
| B | cufflinks_out_19_B | 02-050 | X19 | TRUE | ON |
| B | cufflinks_out_20_B | 02-056 | X20 | TRUE | RH |
+-------+--------------------+--------+----------+-------+---------+
Tabletwo (rows X columns: 56000 X 137)
Table description (partial rows and columns)
+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+
| Ensembl_ID | Gene_ID | X1 | X10 | X13 | X14 | X15 | X16 | X17 |
+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+
| ENSG00000000003 | TSPAN6 | 1.388 | 0.443 | 0.563 | 0.350 | 0.390 | 0.220 | 0.528 |
| ENSG00000000005 | TNMD | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| ENSG00000000419 | DPM1 | 34.309 | 40.2635 | 28.8669 | 21.0556 | 18.1733 | 22.0223 | 25.4352 |
| ENSG00000000457 | SCYL3 | 7.84987 | 9.35551 | 7.45483 | 7.1601 | 6.53686 | 7.26445 | 6.30419 |
| ENSG00000000460 | C1orf112 | 2.36851 | 3.76825 | 3.10324 | 2.0262 | 1.84606 | 3.01185 | 3.02763 |
| ENSG00000000938 | FGR | 227.024 | 222.578 | 247.124 | 234.995 | 255.226 | 265.288 | 323.6 |
| ENSG00000000971 | CFH | 3.17952 | 3.60279 | 1.68429 | 3.74301 | 2.10637 | 0.763278 | 1.78278 |
| ENSG00000001036 | FUCA2 | 16.5566 | 19.1703 | 25.5005 | 18.5244 | 20.7771 | 18.353 | 25.2364 |
| ENSG00000001084 | GCLC | 9.45121 | 16.1362 | 12.6239 | 13.1074 | 10.6472 | 18.9938 | 12.8249 |
| ENSG00000001167 | NFYA | 15.2856 | 18.2201 | 12.4789 | 13.3406 | 15.0146 | 13.2608 | 11.5385 |
| ENSG00000001460 | C1orf201 | 1.64558 | 1.93322 | 0.7927 | 1.71796 | 2.27997 | 0.938738 | 1.3911 |
| ENSG00000001461 | NIPAL3 | 14.6073 | 14.1772 | 11.1503 | 12.5077 | 15.1269 | 13.6 | 11.227 |
| ENSG00000001497 | LAS1L | 14.4519 | 15.3965 | 11.8901 | 16.8572 | 16.7174 | 14.6004 | 15.7266 |
+-----------------+----------+---------+---------+----------+----------+----------+----------+---------+
Script written in Python, module used SQLite3
I am adding the full code.
import os, sys, time
import sqlite3
import apsw
disk_db = apsw.Connection('sampleinfogenotype.db')
memcon=apsw.Connection(":memory:")
with memcon.backup("main",disk_db, "main") as backup:
backup.step() # copy whole database in one go
mdata=memcon.cursor()
for row in memcon.cursor().execute("SELECT tableone.SampleID from tableone WHERE tableone.Diagnos=='RH'"):
sampleID_row=str(row[0])
sqlscript="SELECT "+sampleID_row+ " FROM tabletwo WHERE tabletwo.Gene_ID=='NFYA'"
data=memdata.execute(sqlscript).fetchall()[0]
print sampleID_row,data[0]
memcon.close()
disk_db.close()
It gives the expected result but it is time consuming, since there is a loop! Is there an efficient SQL script or method... any information will be helpful.