1

I have written a code to write parallel in a csv file in python. When my program gets over, what I see is that few lines are merged instead of in seperate lines. Each line should only contain 3 columns. But instead it shows as below

EG

 myname  myage  myvalue 
 myname  myage  myvaluemyname
 myname  myage  myvalue 
 myage

What I understood by reading few other questions, is that I need to lock my file if I want to avoid such scenarios. So I added fcntl module. But it seems my file is still not being locked as it produces similar output

My code

def getdata(x):
    try:
    # get data from API
        c.writefile(x,x1,x2)
except Exception,err:
    print err

class credits:
    def __init__(self):
        self.d = dict()
        self.details = dict()
        self.filename = "abc.csv"
        self.fileopen = open(self.filename,"w")

    def acquire(self):
        fcntl.flock (self.fileopen, fcntl.LOCK_EX)

    def release(self):
        fcntl.flock(self.fileopen, fcntl.LOCK_UN)

    def __del__(self):
        self.fileopen.close()

    def writefile(self,x,x1,x2,x3):
        try:
            self.acquire()
            self.fileopen.write(str(x)+","+str(x1)+","+str(x2)+"\n")
        except Exception, e:
            raise e
        finally:
            self.release()
if __name__ == '__main__':
    conn = psycopg2.connect()
    curr = conn.cursor(cursor_factory=psycopg2.extras.DictCursor)
    curr.execute("select * from emp")
    rows = curr.fetchall()

    listdata = []
    for each in rows:
        listdata.append(each[0])

    c = credits()
    p = Pool(processes = 5)
    results = p.map(getdata,listdata)
    conn.close()

I had to declare getdata as TOP level function otherwise it gave me "Cant pickle function"

Neil
  • 1,715
  • 6
  • 30
  • 45

1 Answers1

0

Why don't you write to multiple files in each separate process and then merge them? It might be more computationally expensive but it will ensure thread safety.

andrea-f
  • 1,045
  • 9
  • 23