I am creating a shared list in memory from an oracle table:
dsn_tns = cx_Oracle.makedsn('myserver.net', '1521', service_name='PARIELGX');
con = cx_Oracle.connect(user='fred',password='password',dsn=dsn_tns);
stats_results = [["OWNER","TABLE","COLUMN_NAME","RECORD_COUNT","DISTINCT_VALUES","MIN_LENGTH","MAX_LENGTH","MIN_VAL","MAX_VAL"]];
sql = "SELECT * FROM ARIEL.MY_TABLE"
cur = con.cursor();
cur.execute(sql);
rs = cur.fetchall();
lrs = manager.list(list(rs)) # shared memory list
print("Type of lrs is :" + str(type(lrs)))
print("Address of lrs is: " + hex(id(lrs)))
key line is here:
lrs = manager.list(list(rs)) # shared memory list
print("Type of lrs is :" + str(type(lrs)))
print("Address of lrs is: " + hex(id(lrs)))
The address of the list is printed out:
Address of lrs is: 0x7f3c71292ed0
I was expecting when I pass the list as an argument to the called process, that it wouldn't be copied. But when I check the memory of the argument in the process being called by map_async it is different. I am either using Manager.list() incorrectly and the entire list is being serialised to the process, or a proxy object is sent somehow. But the performance of my parallel job implies I shouldnt be passing the list as a parameter in map_async like this:
pool_results = pool.map_async(gs.get_column_stats_rs, [(lrs, col_name, col_names) for col_name in col_names]).get()
The memory address of lrs in the get_column_stats function is as below:
SUBPROCESS Address of rs is: 0x7f3c73283690
So how do I enable a child process called by map_async, to access a managers shared list for processing if not by passing it as an argument?
Note the type of the list was confirmed to be: 'multiprocessing.managers.ListProxy'.
I am assuming using Manager.list(), where the list is a list of tuples is OK.