I have a python to sql script that reads netcdf file and inserts climatic data to a postgresql table, one row at the time. This of course takes forever, and now I would like to figure out how I can optimize this code. I have been thinking about making a huge list, and then use the copy command. However, I am unsure how one would work that out. Another way might be to write to a csv file and then copy this csv file to the postgres database using the COPY command in Postgresql. I guess that would be quicker than inserting one row at a time.
If you have any suggestions on how this could be optimized, then I would really appreciate it. The netcdf file is available here (need to register though): http://badc.nerc.ac.uk/browse/badc/cru/data/cru_ts/cru_ts_3.21/data/pre
# NetCDF to PostGreSQL database
# CRU-TS 3.21 precipitation and temperature data. From NetCDF to database table
# Requires Python2.6, Postgresql, Psycopg2, Scipy
# Tested using Vista 64bit.
# Import modules
import psycopg2, time, datetime
from scipy.io import netcdf
# Establish connection
db1 = psycopg2.connect("host=192.168.1.162 dbname=dbname user=username password=password")
cur = db1.cursor()
### Create Table
print str(time.ctime())+ " Creating precip table."
cur.execute("DROP TABLE IF EXISTS precip;")
cur.execute("CREATE TABLE precip (gid serial PRIMARY KEY not null, year int, month int, lon decimal, lat decimal, pre decimal);")
### Read netcdf file
f = netcdf.netcdf_file('/home/username/output/project_v2/inputdata/precipitation/cru_ts3.21.1901.2012.pre.dat.nc', 'r')
##
### Create lathash
print str(time.ctime())+ " Looping through lat coords."
temp = f.variables['lat'].data.tolist()
lathash = {}
for entry in temp:
print str(entry)
lathash[temp.index(entry)] = entry
##
### Create lonhash
print str(time.ctime())+ " Looping through long coords."
temp = f.variables['lon'].data.tolist()
lonhash = {}
for entry in temp:
print str(entry)
lonhash[temp.index(entry)] = entry
##
### Loop through every observation. Set timedimension and lat and long observations.
for _month in xrange(1344):
if _month < 528:
print(str(_month))
print("Not yet")
else:
thisyear = int((_month)/12+1901)
thismonth = ((_month) % 12)+1
thisdate = datetime.date(thisyear,thismonth, 1)
print(str(thisdate))
_time = int(_month)
for _lon in xrange(720):
for _lat in xrange(360):
data = [int(thisyear), int(thismonth), lonhash[_lon], lathash[_lat], f.variables[('pre')].data[_time, _lat, _lon]]
cur.execute("INSERT INTO precip (year, month, lon, lat, pre) VALUES "+str(tuple(data))+";")
db1.commit()
cur.execute("CREATE INDEX idx_precip ON precip USING btree(year, month, lon, lat, pre);")
cur.execute("ALTER TABLE precip ADD COLUMN geom geometry;")
cur.execute("UPDATE precip SET geom = ST_SetSRID(ST_Point(lon,lat), 4326);")
cur.execute("CREATE INDEX idx_precip_geom ON precip USING gist(geom);")
db1.commit()
cur.close()
db1.close()
print str(time.ctime())+ " Done!"