1

Foreword: this is similar to Create lines of text, '\n'.join(my_list) is missing trailing newline :-(, except here it's a generator, not a list.

I need to produce a text file from a generator function yielding individual string lines which are not line-terminated.

I believe the recommended approach for building up such a string is (assuming g is the generator object)

'\n'.join(g)

This will however miss the trailing newline.

Here's an example using ',' instead of '\n':

>>> g=(str(i) for i in range(0,10))
>>> ','.join(g)
'0,1,2,3,4,5,6,7,8,9'

Of course I can manually a + '\n' at the end but I believe this could get expensive.

I tried using itertools.chain() appending an empty string, but this gave surprising results:

>>> import itertools
>>> g=itertools.chain((str(i) for i in range(0,10)),'')
>>> ','.join(g)
'0,1,2,3,4,5,6,7,8,9'

How can I actually do it? Would + '\n' be really that expensive?

iurly
  • 79
  • 4

1 Answers1

2

You might be surprised to hear, but converting the generator into a list, appending the empty ("") value and using str.join will be your fastest method of doing it.

I like your thinking, you want it to be more efficient using generators, but "".join actually converts your genexp internally into a list before joining. The reason it does so is because it needs to measure the length of the final string and allocate memory accordingly. That way it makes two passes over your generator (basically creating a list in order to hold the values temporarily)

py -3 -m timeit "''.join([str(i) for i in range(100000)])"
10 loops, best of 5: 29.6 msec per loop

py -3 -m timeit "''.join((str(i) for i in range(100000)))"
10 loops, best of 5: 32.3 msec per loop

Takes the same memory as well.

Bharel
  • 23,672
  • 5
  • 40
  • 80
  • Thanks a bunch for the insight and the explanation! For the sake of completeness though, shouldn't your second example above build a generator first, and **then** convert it into a list? – iurly Sep 09 '18 at 14:08
  • Nope. Join does the conversation implicitly and that's what I'm trying to show. What I can do is link to the exact place where the conversion happens in the C code – Bharel Sep 09 '18 at 15:38
  • I tried it myself, here's what I found: `python3 -m timeit "l=[str(i) for i in range(100000)]; ''.join(l)"` `10 loops, best of 3: 25.5 msec per loop` `python3 -m timeit "g=(str(i) for i in range(100000)); ''.join(g)"` `10 loops, best of 3: 28.5 msec per loop` `python3 -m timeit "g=(str(i) for i in range(100000)); l=list(g); ''.join(l)"` `10 loops, best of 3: 28.9 msec per loop` So that just confirms your statement. Could you please edit your answer adding this third case? It really makes your point indeed. – iurly Sep 10 '18 at 20:11