1

I have a list of strings, and would like to pass this to an api that accepts only a file-like object, without having to concatenate/flatten the list to use the likes of StringIO.

The strings are utf-8, don't necessarily end in newlines, and if naively concatenated could be used directly in StringIO.

Preferred solution would be within the standard library (python3.8) (Given the shape of the data is naturally similar to a file (~identical to readlines() obviously), and memory access pattern would be efficient, I have a feeling I'm just failing to DuckDuckGo correctly) - but if that doesn't exist any "streaming" (no data concatenation) solution would suffice.


[Update, based on @JonSG's links]

Both RawIOBase and TextIOBase look provide an api that decouples arbitrarily sized "chunks"/fragments (in my case: strings in a list) from a file-like read which can specify its own read chunk size, while streaming the data itself (memory cost increases by only some window at any given time [dependent of course on behavior of your source & sink])

RawIOBase.readinto looks especially promising because it provides the buffer returned to client reads directly, allowing much simpler code - but this appears to come at the cost of one full copy (into that buffer).

TextIOBase.read() has its own cost for its operation solving the same subproblem, which is concatenating k (k much smaller than N) chunks together.

I'll investigate both of these.

some bits flipped
  • 2,592
  • 4
  • 27
  • 42
  • Why the preference not to use StringIO? – JonSG Jun 14 '21 at 16:37
  • @JonSG - `StringIO` is fine - but as I understand the docs, there is no provision to use anything but a single string under the hood: what i can't do is concatenate the list of strings, which is large. Even using the size (which i don't have prior knowledge of, but probably wouldn't increase the big-O of the following) and something like a `MutableString`, the memory size doubles and an extra traversal of the (full, large) string is required. If there is some way to couple something similar to ~`itertools.chain` with `StringIO`, that would be fine, too. – some bits flipped Jun 14 '21 at 16:57
  • 1
    Ah, i see. Check out : https://stackoverflow.com/questions/12593576/adapt-an-iterator-to-behave-like-a-file-like-object-in-python and/or https://stackoverflow.com/questions/6657820/how-to-convert-an-iterable-to-a-stream/20260030#20260030 – JonSG Jun 14 '21 at 17:03
  • @somebitsflipped Can you add a link to 'What is exactly a file-like object in Python' --> https://stackoverflow.com/questions/4359495/what-is-exactly-a-file-li(ke-object-in-python, so that guys like me could try to follow your question/answer (seems interesting to me) – pippo1980 Jun 14 '21 at 18:17
  • is this relevant to the problem : Reworking StringIO concatenation in Python (https://lwn.net/Articles/816415/) ? – pippo1980 Jun 14 '21 at 18:26
  • @pippo1980 - fair question. See the official [glosary](https://docs.python.org/3/glossary.html) `file object` """An object exposing a file-oriented API (with methods such as read() or write()) to an underlying resource [...] also called _file-like objects_ or _streams_""". example of `Duck typing` There are some nuances depending on exact use. Also, the link you sent has some similarities to this problem, but its focused on making it easier to avoid the `x = "foo" ; x += "bar"` anti-pattern. – some bits flipped Jun 14 '21 at 19:26
  • 1
    @JonSG - [1-of-2] excellent references, thank you. From your first link, [JLR's answer](https://stackoverflow.com/a/12593675/309433) shows a ~naive `itertools.chain` style approach; the comments back up my fears there about the perf implications of the resulting char-by-char iteration. [MJ's answer](https://stackoverflow.com/a/12604375/309433) with `TextIOBase` however seems highly relevant. – some bits flipped Jun 14 '21 at 19:33
  • 1
    @JonSG [2-of-2] and in your second link, [MS's answer](https://stackoverflow.com/a/20260030/309433) using `RawIOBase.readinto` looks to be the same concept as [MJ's](https://stackoverflow.com/a/12604375/309433) `TextIOBase` but using an api with a lot less friction for my use. I will have to use & benchmark these two. – some bits flipped Jun 14 '21 at 19:37

0 Answers0