1

I’ve been having problems with speeding up Pywikibot. I’ve seen related questions here on StackOverflow, but they only partially apply to my problem:

  • I set throttle=False wherever I could, but the bot is still very slow.
  • I can’t use the PreloadingPageGenerator like proposed here because I am not using the Bot to access Wikipedia but Wikidata. In my case, the requests look something like this

    from pywikibot.data import api
    
    request_parameters = {
         'action': 'wbsearchentities',
         'format': 'json',
         'language': language,
         'type': 'item',
         'search': name,
         'throttle': False
    }
    request = api.Request(site=self.wikidata_site, use_get=True, **request_parameters)
    response = request.submit()
    

I now tried to use multiprocessing so multiple requests can be sent to the API at once which removes the necessity to wait for the response before you can proceed with the next request, which looks like this

while not queue.empty():  # Queue holding data for requests
    job_data = [queue.get() for i in range(number_of_processes)]

    jobs = [
        multiprocessing.Process(
                target=search_for_entity,
                args=(name, language)
            )
        for name, language in job_data
    ]

    for job in jobs:
        job.start()

    for job in jobs:
        job.join()

But the moment I run the program, it doesn’t even finish the first request because it gets stuck. I followed the bug to pywikibot/data/api.py:1500 submit():

 rawdata = http.request(
                site=self.site, uri=uri, method='GET' if use_get else 'POST',
                body=body, headers=headers)

through pywikibot/comms/http.py:361 fetch():

request = _enqueue(uri, method, body, headers, **kwargs)
request._join()

to pywikibot/comms/threadedhttp.py:359 _join(), where an acquired lock never seems to get released

def _join(self):
    """Block until response has arrived."""
    self.lock.acquire(True)

My question now it: Is this a bug of pywikibot? Have I applied multiprocessing to this problem in a wrong way? Are the any other solutions in my specific situation to speed up pywikibot?

Community
  • 1
  • 1
Kaleidophon
  • 589
  • 1
  • 5
  • 16

1 Answers1

-2

This code part is very outdated and threadedhttp module as you have shown is dropped years ago. I propose to update your Pywikibot.

xqt
  • 280
  • 1
  • 11