I need to update every record in a spatial database in which I have a data set of points that overlay data set of polygons. For each point feature I want to assign a key to relate it to the polygon feature that it lies within. So if my point 'New York City' lies within polygon USA and for the USA polygon 'GID = 1' I will assign 'gid_fkey = 1' for my point New York City.
To do this I have created the following query.
procQuery = 'UPDATE city SET gid_fkey = gid FROM country WHERE ST_within((SELECT the_geom FROM city WHERE wp_id = %s), country.the_geom) AND city_id = %s' % (cityID, cityID)
At present I am getting the cityID info from another query that just selects all cityID where gid_fkey is NULL. Essentially I just need to loop through these and run the query shown earlier. As the query only relies on static information in the other table in theory all of these processes can be run at once. I have implemented the threading procedure below but I can't seem to make the migration to multiprocessing
import psycopg2, pprint, threading, time, Queue
queue = Queue.Queue()
pyConn = psycopg2.connect("dbname='geobase_1' host='localhost'")
pyConn.set_isolation_level(0)
pyCursor1 = pyConn.cursor()
getGID = 'SELECT cityID FROM city'
pyCursor1.execute(getGID)
gidList = pyCursor1.fetchall()
class threadClass(threading.Thread):
def __init__(self, queue):
threading.Thread.__init__(self)
self.queue = queue
def run(self):
while True:
gid = self.queue.get()
procQuery = 'UPDATE city SET gid_fkey = gid FROM country WHERE ST_within((SELECT the_geom FROM city WHERE wp_id = %s), country.the_geom) AND city_id = %s' % (cityID, cityID)
pyCursor2 = pyConn.cursor()
pyCursor2.execute(procQuery)
print gid[0]
print 'Done'
def main():
for i in range(4):
t = threadClass(queue)
t.setDaemon(True)
t.start()
for gid in gidList:
queue.put(gid)
queue.join()
main()
I'm not even sure if the multithreading is optimal but it is definitely faster than going through one by one.
The machine I will be using has four cores (Quad Core) and a minimal Linux OS with no GUI, PostgreSQL, PostGIS and Python if that makes a difference.
What do I need to change to get this painfully easy multiprocessing task enabled?