0

When my flask application starts up, it needs to add a bunch of data to Postgres via SQLalchemy. I'm looking for a good way to do this. The data is in TSV format, and I already have a SQLalchemy db.model schema for it. Right now:

for datafile in datafiles:
    with open(datafile,'rb') as file:
        # reader = csv.reader(file, delimiter='\t')
        reader = csv.DictReader(file, delimiter='\t', fieldnames=[])
        OCRs = # somehow efficiently convert to list of dicts...
        db.engine.execute(OpenChromatinRegion.__table__.insert(), OCRs)

Is there a better, more direct way? Otherwise, what is the best way of generating OCRs ?

The solution suggested here seems clunky.

Community
  • 1
  • 1
Alex Lenail
  • 12,992
  • 10
  • 47
  • 79

1 Answers1

1
import csv
from collections import namedtuple

fh = csv.reader(open(you_file, "rU"), delimiter=',', dialect=csv.excel_tab)
headers = fh.next()
Row = namedtuple('Row', headers)
OCRs = [Row._make(i)._asdict() for i in fh]
db.engine.execute(OpenChromatinRegion.__table__.insert(), OCRs)

# plus your loop for multiple files and exception handling of course =)
MOCKBA
  • 1,680
  • 11
  • 19
  • `[CRITICAL] WORKER TIMEOUT` on `db.engine.execute(OpenChromatinRegion.__table__.insert(), OCRs)`. Any ideas? – Alex Lenail May 11 '16 at 23:30
  • unrelated probably, looks like it's coming from a web server? Here is what I would do: enable general logging on MySQL side: `show global variables like '%general%'; set global general_log = ON;` Then tail mysql general log file (you will see the location in the output above). See how much time it takes to execute the query, try to optimize if it's slow. Otherwise, it's not a DB or sqla issue and you'll need to play with your web server settings. – MOCKBA May 12 '16 at 17:18