I am looking at a way to allow concurrent file object seeking.
As a test case of file seeking going wary:
#!/usr/bin/env python2
import time, random, os
s = 'The quick brown fox jumps over the lazy dog'
# create some file, just for testing
f = open('file.txt', 'w')
f.write(s)
f.close()
# the actual code...
f = open('file.txt', 'rb')
def fn():
out = ''
for i in xrange(10):
k = random.randint(0, len(s)-1)
f.seek(k)
time.sleep(random.randint(1, 4)/10.)
out += s[k] + ' ' + f.read(1) + '\n'
return out
import multiprocessing
p = multiprocessing.Pool()
n = 3
res = [p.apply_async(fn) for _ in xrange(n)]
for r in res:
print r.get()
f.close()
I have worker processes, which do random seeking within the file, then sleep
, then read
. I compare what they read
to the actual string character. I do not print right away to avoid concurrency issues with printing.
You can see that when n=1
, it all goes well, but everything goes astray when n>1
due to concurrency in the file descriptor.
I have tried to duplicate the file descriptor within fn()
:
def fn():
fd = os.dup(f)
f2 = os.fdopen(fd)
And then I use f2
. But it does not seem to help.
How can I do seeking concurrently, i.e. from multiple processes? (In this case, I could just open
the file within fn()
, but this is a MWE. In my actual case, it is harder to do that.)