1

Doing some computation that results in strings that contain byte data (the strings serve as byte arrays). Now this data needs to be sent to another program that expects all this data to be concatenated. From what you can read here, the best way to concatenate appears to be dumping the data into a list and then doing ''.join(lst) but it appears to me that creating might incur a memory overhead .

Is there any way to enjoy the benefits of ''.join(lst) without creating a long list?

It is not hard to approximate how big the complete string is going to be. Is there a way to allocate that space and just pour the data inside? For instance with something like numpy? Then convert it into a huge string?

AturSams
  • 7,568
  • 18
  • 64
  • 98
  • 1
    `lst` can be a generator expression rather than an actual list. – martineau Jul 08 '15 at 16:39
  • 1
    @martineau, but python will construct a list anyway. If you pass a generator python will first construct a list as it has to do two passes over the data – Padraic Cunningham Jul 08 '15 at 16:40
  • @Padraic: Better to let python do it internally. – martineau Jul 08 '15 at 16:42
  • 1
    How is the data being sent to another program? Via a socket? If so, perhaps you could send the total size and then the pieces through the socket as they are generated. – unutbu Jul 08 '15 at 16:47
  • 2
    What about using `io.StringIO`, "an in-memory stream for text I/O"? Use `write()` to append each string, then `getvalue()` to get the finished product. Disclaimer: I don't actually have a clue whether this is a good idea. https://docs.python.org/3/library/io.html#io.StringIO – Sam Jul 08 '15 at 16:48
  • Actually, this answer http://stackoverflow.com/a/19926932/4618331 shows that `str.join` is still the best way to go. The Python docs say so too. – Sam Jul 08 '15 at 16:53
  • @zehelvion, is memory your greatest concern? – Padraic Cunningham Jul 08 '15 at 17:04
  • @HappyLeapSecond The data is saved to a huge file which is uploaded later to the cloud – AturSams Jul 09 '15 at 16:55
  • @PadraicCunningham It is a viable concern cause the strings aren't huge but there will be an unknown growing number of them. Could do this in C but I'd much rather do the computation in Python were code is imho flexible, maintainable and readable among other benefits. – AturSams Jul 09 '15 at 16:58
  • 1
    @zehelvion: If the big string is going to be written to a file, couldn't you then just write the smaller strings to the file sequentially without joining them first? – unutbu Jul 09 '15 at 17:06
  • @HappyLeapSecond That is a really great point but currently the functionality is invoked by existing architecture that expects one large string as the return value. That being said, perhaps the external code could expect many mini strings and just write them one by one into a single file. This is a very good idea. – AturSams Jul 09 '15 at 17:18

1 Answers1

-1

str.join() actually does not need a list to join, but any kind of iterable. Therefore you could work with generators, serving string after string:

def calculate_something():
    # do something
    data = b"Foobar"
    yield from data
    # do something else
    yield from other_function_returning_string_data()

final_results = ''.join(calculate_something())

The yield from syntax is new since Python 3.3, if you are using something below 3.3 for c in data: yield c should work as well.

Finwood
  • 3,829
  • 1
  • 19
  • 36
  • Doing it like this (`yield from`) is effectively breaking each of the strings up into individual characters and then yielding each character of each one separately. – martineau Jul 08 '15 at 16:59