Multithreading / Asynchronous I/O

Question

I have a conceptual question.

I currently have a programme that performs within a never ending loop.

Def (mycode):

    Perform login to server and retrieve cookies etc
    While 1:
      Perform an URL request (with custom headers, cookies etc)
      Process the reply
      Perform URL requests dependent upon the values in replies
      Process reply

I am happy for this to continue as it is, as the URL's must be called one after the other.

Now the server limits a single account to a limited number of functions, it would be useful to be able to perform this function with two (or more) different accounts.

My question is: Is this possible to do? I have done a reasonable amount of reading on queues and multithreading, if you nice people could suggest a method with a good (easy to understand) example I would be most appreciative.

Are you fetching the same url at step#1 each time ? in effect polling the page ? then acting on it's results differently. — sotapme, Jan 31 '13 at 01:25
Not exactly the same. I would be fetching www.foo.com/type=car and then maybe changing the params slightly for a different account — user958551, Jan 31 '13 at 01:48

sotapme · Answer 1 · 2013-01-31T01:31:23.747

0

Gevent is a performant green threads implementation that has example.

I'm unsure if by doing this for different accounts to the same server you mean having different worker functions handling the url processing - in effect having Def (mycode) n times for each account. Perhaps you could expand on the detail.

>>> import gevent
>>> from gevent import socket
>>> urls = ['www.google.com', 'www.example.com', 'www.python.org']
>>> jobs = [gevent.spawn(socket.gethostbyname, url) for url in urls]
>>> gevent.joinall(jobs, timeout=2)
>>> [job.value for job in jobs]
['74.125.79.106', '208.77.188.166', '82.94.164.162']

In addition you could break up the problem by using something like beanstalkd which would allow you to run your main process 'n' times for each account and put the results on a beanstalk-queue for processing by another process. Saves having to deal with threading which is always a good thing in non-trivial applications.

edited Jan 31 '13 at 01:31

answered Jan 31 '13 at 01:22

sotapme

4,695
2
19
20

I don't mean worker functions. Only a single account can download each link. It is more having one account working on a section (i.e. type=car, whilst another works on type=horse) – user958551 Jan 31 '13 at 01:51
This is similar perhaps ? http://stackoverflow.com/questions/6905800/multiprocessing-useless-with-urllib2 – sotapme Jan 31 '13 at 02:06
Possibly, I did look at that example. What I am not sure about is how to handle the different cookies being passed. – user958551 Jan 31 '13 at 02:14
Well I can only suggest you get a function that does what you want first and then worry about making it multithreaded. If you find that it's too difficult to do m/t because of the cookies then look into making it multiprocess perhaps using beanstalkd to communicate with. Also this question looks into cookies http://stackoverflow.com/questions/189555/how-to-use-python-to-login-to-a-webpage-and-retrieve-cookies-for-later-usage outside my comfort zone. :D – sotapme Jan 31 '13 at 02:25
Thank you very much for the help, I do have the function performing correctly. It works just fine with a single account, which is why I am now looking into multithreading. I'll carry on reading and see what I can figure out. – user958551 Jan 31 '13 at 08:53
http://pythonquirks.blogspot.co.uk/2009/12/asynchronous-http-request.html has a useful example, as with most examples they never cover all that you might want to do. I think it gives a flavour of it though. – sotapme Jan 31 '13 at 10:08

Multithreading / Asynchronous I/O

1 Answers1