Deadlock while using multiprocessing to speed up Pywikibot with Wikidata?

Question

I’ve been having problems with speeding up Pywikibot. I’ve seen related questions here on StackOverflow, but they only partially apply to my problem:

I set throttle=False wherever I could, but the bot is still very slow.

I can’t use the PreloadingPageGenerator like proposed here because I am not using the Bot to access Wikipedia but Wikidata. In my case, the requests look something like this

from pywikibot.data import api

request_parameters = {
     'action': 'wbsearchentities',
     'format': 'json',
     'language': language,
     'type': 'item',
     'search': name,
     'throttle': False
}
request = api.Request(site=self.wikidata_site, use_get=True, **request_parameters)
response = request.submit()

I now tried to use multiprocessing so multiple requests can be sent to the API at once which removes the necessity to wait for the response before you can proceed with the next request, which looks like this

while not queue.empty():  # Queue holding data for requests
    job_data = [queue.get() for i in range(number_of_processes)]

    jobs = [
        multiprocessing.Process(
                target=search_for_entity,
                args=(name, language)
            )
        for name, language in job_data
    ]

    for job in jobs:
        job.start()

    for job in jobs:
        job.join()

But the moment I run the program, it doesn’t even finish the first request because it gets stuck. I followed the bug to pywikibot/data/api.py:1500 submit():

 rawdata = http.request(
                site=self.site, uri=uri, method='GET' if use_get else 'POST',
                body=body, headers=headers)

through pywikibot/comms/http.py:361 fetch():

request = _enqueue(uri, method, body, headers, **kwargs)
request._join()

to pywikibot/comms/threadedhttp.py:359 _join(), where an acquired lock never seems to get released

def _join(self):
    """Block until response has arrived."""
    self.lock.acquire(True)

My question now it: Is this a bug of pywikibot? Have I applied multiprocessing to this problem in a wrong way? Are the any other solutions in my specific situation to speed up pywikibot?

score -2 · Answer 1 · answered Feb 03 '21 at 15:50

-2

This code part is very outdated and threadedhttp module as you have shown is dropped years ago. I propose to update your Pywikibot.

answered Feb 03 '21 at 15:50

xqt

280
1
11

1

This is not answer. – kk. Feb 03 '21 at 17:27
It is. You can't do any improvements like speeding up for an outdated framework. – xqt Feb 04 '21 at 14:16

Deadlock while using multiprocessing to speed up Pywikibot with Wikidata?

1 Answers1