0

In a script which process 500k links on XML validation and relaxng i tried to count cases in myFunc(). If i use global variables i had to mark them global in myFunc() before i can change them. When i printed out the values of them in myFunc() i can see that the value is changed to 1,2,3,4 and so on. But when i print out the values in run() i get not the changed values. All three variables are 0 in run(), like before changing them in myFunc().

I know that there are serious better ways to do this job. But my question is why the changed globals are not changed anymore in run() and if there is a possibilty to realize this?

Has it to do with the multiprocessing?

valid = 0
excpt = 0
relaxerr = 0

def myFunc(link):
   try:
      global valid
      valid += 1
      print valid
      doc = etree.parse(urllib2.urlopen(link))
   except Exception, e:
      global except
      excpt += 1
      print excpt
      with open('log.txt', 'a') as f:
         f.write('%s\n' % e)
      return

   if not RELAXNG.validate(doc):
      global relaxerr
      relaxerr += 1
      print relaxerr
      with open('log.txt', 'a') as f:
         f.write('%s\n' % RELAXNG.error_log)
      return

   ....
   do stuff for valid ....

def run():
   ...
   pool.map_async(myFunc, links, 64)
   pool.wait()


   print valid
   print excpt
   print relaxerr
surfi
  • 1,451
  • 2
  • 12
  • 25

1 Answers1

1

The run function runs in the first process, while the calls to myFunc are done in different processes, which do not share the address space.

What you are doing would work with threads(probably using some locks...), since they do share the address space.

If you want to use multiprocessing you have to use some explicit process communication between the processes. For example you could use a pipe, queue or manager(see the multiprocessing documentation).

Bakuriu
  • 98,325
  • 22
  • 197
  • 231
  • I was thinking on different scopes. But different processes, that makes sense. Thank you very much. – surfi Mar 31 '13 at 10:34