7

I'm using the Python requests library to send a POST request. The part of the program that produces the POST data can write into an arbitrary file-like object (output stream).

How can I make these two parts fit?

I would have expected that requests provides a streaming interface for this use case, but it seems it doesn't. It only accepts as data argument a file-like object from which it reads. It doesn't provide a file-like object into which I can write.

Is this a fundamental issue with the Python HTTP libraries?

Ideas so far:

It seems that the simplest solution is to fork() and to let the requests library communicate with the POST data producer throgh a pipe.

Is there a better way?

Alternatively, I could try to complicate the POST data producer. However, that one is parsing one XML stream (from stdin) and producing a new XML stream to used as POST data. Then I have the same problem in reverse: The XML serializer libraries want to write into a file-like object, I'm not aware of any possibility that an XML serializer provides a file-like object from which other can read.

I'm also aware that the cleanest, classic solution to this is coroutines, which are somewhat available in Python through generators (yield). The POST data could be streamed through (yield) instead of a file-like object and use a pull-parser.

However, is possible to make requests accept an iterator for POST data? And is there an XML serializer that can readily be used in combination with yield?

Or, are there any wrapper objects that turn writing into a file-like object into a generator, and/or provide a file-like object that wraps an iterator?

vog
  • 23,517
  • 11
  • 59
  • 75
  • Why should _`requests`_ be obliged to provide _"a file-like object into which one can write"?_ It's designed to work in foreground rather than background mode so it needs to read rather than provide a descriptor and passively wait for input. If _you_ need it, you can provide it yourself as easily as: `r,w=(os.fdopen(f,mode) for f,mode in zip(os.pipe(),("rb","wb")))` - then run the two parts in separate threads. – ivan_pozdeev Oct 14 '16 at 09:47

2 Answers2

12

request does take an iterator or generator as data argument, the details are described in Chunk-Encoded Requests. The transfer encoding needs to be chunked in this case because the data size is not known beforehand.

Here is a very simle example that uses a queue.Queue and can be used as a file-like object for writing:

import requests
import queue
import threading

class WriteableQueue(queue.Queue):

    def write(self, data):
        # An empty string would be interpreted as EOF by the receiving server
        if data:
            self.put(data)

    def __iter__(self):
        return iter(self.get, None)

    def close(self):
        self.put(None)

# quesize can be limited in case producing is faster then streaming
q = WriteableQueue(100)

def post_request(iterable):
    r = requests.post("http://httpbin.org/post", data=iterable)
    print(r.text)

threading.Thread(target=post_request, args=(q,)).start()

# pass the queue to the serializer that writes to it ...    
q.write(b'1...')
q.write(b'2...')

# closing ends the request
q.close()
mata
  • 67,110
  • 10
  • 163
  • 162
  • I don't think that `if data:` is really needed. Why would empty data "be interpeted as EOF"? The iterator sentinel is `None`, not the empty byte string `b''`. So a `q.write(b'')` would never be interpreted as EOF anyway, would it? – vog Oct 14 '16 at 06:45
  • 1
    @vog It's not the sentinel that interprets it as EOF, requests terminates streaming the request when it encounters a `b''`. And when using a external library (e.g. an xml serializer) to write to this file-like object you can't be sure that it won't write an empty string. It may somwere do something like `out.write(b''.join(some_maybe_empty_list))` unconditionally. Writing an empty string should not "close" the file (it's not really closed but only terminates the current iterator. Adding a `closed` state wouldn't be too hard) – mata Oct 14 '16 at 07:25
  • Oh, that's good to know. I adjusted the comment in your code to reflect this issue more clearly. (Feel free to fix the comment if I misunderstood something.) – vog Oct 14 '16 at 08:26
  • 1
    Acutally, I had to edit again because I checked it again, and it's not requests but the server on the other end that interprets it as EOF. requests just sends a chunk with a size of 0 which the server correctly sees as end of stream. – mata Oct 14 '16 at 09:49
-1

The only way of connecting a data producer that requires a push interface for its data sink with a data consumer that requires a pull interface for its data source is through an intermediate buffer. Such a system can be operated only by running the producer and the consumer in "parallel" - the producer fills the buffer and the consumer reads from it, each of them being suspended as necessary. Such a parallelism can be simulated with cooperative multitasking, where the producer yields the control to the consumer when the buffer is full, and the consumer returns the control to the producer when the buffer gets empty. By taking the generator approach you will be building a custom-tailored cooperative multitasking solution for your case, which will hardly end up being simpler compared to the easy pipe-based approach, where the responsibility of scheduling the producer and the consumer is entirely with the OS.

Leon
  • 31,443
  • 4
  • 72
  • 97