python read() from stdout much slower than reading line by line (slurping?)

Question

I have a python SubProcess call that runs an executable and pipes the output to my subprocess stdout.

In cases where the stdout data is relatively small (~2k lines), the performance between reading line by line and reading as a chunk (stdout.read()) is comparable...with stdout.read() being slightly faster.

Once the data gets to be larger (say 30k+ lines), the performance for reading line by line is significantly better.

This is my comparison script:

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
for line in (iter(proc.stdout.readline,b'')):
    tmp.append(line)
print("line by line = %.2f"%(time.clock()-tic))

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
fullFile=proc.stdout.read()
print("slurped = %.2f"%(time.clock()-tic))

And these are the results for a read of ~96k lines (or 50mb of on disk memory):

line by line = 5.48
slurped = 153.03

I am unclear why the performance difference is so extreme. My expectation is that the read() version should be faster than storing the results line by line. Of course, I was expecting faster line by line results in practical case where there is significant per line processing that could be done during the read.

Can anyone give me insight into the read() performance cost?

Does the subprocess always take the same time to execute? (E.g. there are not effects of caching on repeated runs etc) — NPE, Jan 27 '14 at 16:54
Cannot reproduce with `seq 30000` (http://linux.die.net/man/1/seq). I think we are going to need an SSCCE (http://sscce.org/). — NPE, Jan 27 '14 at 16:57
I would suspect memory pressure. Have you compared the patterns of memory usage in the two cases? — Alp, Jan 27 '14 at 17:37
@NPE: This is a windows system...if there are any differences in python execution vs linux python. Also, I'm not sure seq 30000 would force the behavior I'm seeing, as I think it is a volume of data issue, not a number of lines issue. Each of my lines has ~400 characters. — PaulD, Jan 27 '14 at 17:46
@Alp...I'm not sure what memory pressure is. I do notice that the process memory in the .read() case will move up and down in increments of 20-30mb with a definite upward drift. — PaulD, Jan 27 '14 at 17:48
I meant that in your 30k line case, you might well be exhausting physical memory and spending a lot of time swapping. — Alp, Jan 28 '14 at 03:22
use `timeit.default_timer` instead of `time.clock` that does different things on different systems i.e., my guess that the numbers are the benchmark artifact (not real). Also, move the code into separate scripts to avoid interference. — jfs, Jan 30 '14 at 03:34

Dan Keder · Accepted Answer · 2014-02-22T09:23:54.457

This is not just Python, reading by chars without buffering is always slower then reading-in lines or big chunks.

Consider these two simple C programs:

[readchars.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        int c;
        c = fgetc(fh);
        while (c != EOF) {
                c = fgetc(fh);
        }

        return 0;
}

[readlines.c]

#include <stdlib.h>
#include <stdio.h>
#include <errno.h>

int main(void) {
        FILE* fh = fopen("largefile.txt", "r");
        if (fh == NULL) {
                perror("Failed to open file largefile.txt");
                exit(1);
        }

        char* s = (char*) malloc(120);
        s = fgets(s, 120, fh);
        while ((s != NULL) && !feof(fh)) {
                s = fgets(s, 120, fh);
        }

        free(s);

        return 0;
}

And their results (YMMW, largefile.txt was ~200MB text file):

$ gcc readchars.c -o readchars
$ time ./readchars            
./readchars  1.32s user 0.03s system 99% cpu 1.350 total
$ gcc readlines.c -o readlines
$ time ./readlines            
./readlines  0.27s user 0.03s system 99% cpu 0.300 total

score 1 · Answer 2 · answered Jan 27 '14 at 17:07

Try adding a bufsize option to your Popen call and see if it makes a difference:

proc=subprocess.Popen(executable, bufsize=-1, stdout=subprocess.PIPE)

Popen includes an option to set the buffer size for reading input. bufsize defaults to 0, which means unbuffered input. Any other value means a buffer of approximately that size. A negative value means to use the system default, which means fully buffered input.

The Python docs include this note:

Note: if you experience performance issues, it is recommended that you try to enable buffering by setting bufsize to either -1 or a large enough positive value (such as 4096).

Results with buffer: line by line= 2.41 slurped no buffer= 30.90 slurped w/Buffer= 31.78 — PaulD, Jan 27 '14 at 17:53
FWIW this resulted in a 8x speedup for my particular usecase, so thanks! — Philippe, Jan 28 '15 at 16:43

Javier · Answer 3 · 2014-02-22T00:14:22.233

I don't get that behaviour at all.

import subprocess
import time


executable = ["cat", "data"]

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
tmp = []
for line in (iter(proc.stdout.readline,b'')):
    tmp.append(line)
print("line by line = %.2f"%(time.clock()-tic))

proc=subprocess.Popen(executable,stdout=subprocess.PIPE)
tic=time.clock()
fullFile=proc.stdout.read()
print("slurped = %.2f"%(time.clock()-tic))

Data is text.

pts/0$ ll data
-rw-r--r-- 1 javier users 18M feb 21 20:53 data

pts/0$ wc -l data
169866 data

Result:

pts/0$ python3 a.py 
line by line = 0.08
slurped = 0.01

Python 2 is much slower than Python 3!

pts/0$ python2 a.py 
line by line = 4.45
slurped = 0.01

Perhaps it depends on the sub-process?

`bufsize` has different defaults in Python 2 and 3. If you set it explicitly; you should get similar results in both versions. Use `timeit.default_timer` instead of `time.clock()` for portability. — jfs, Feb 22 '14 at 09:29

score 0 · Answer 4 · answered Jul 18 '17 at 11:02

I've had spotty results with the bufsize, I run a continuous ping script that logs replies, and I need it to run non stop, this will hang every few days and my solution was to write a separate script to watch the tasklist and kill any ping task that takes more than 10 seconds. See below

import subprocess
import time

CREATE_NO_WINDOW = 0x08000000
previous_id = ''

while 0!=1:
    command = subprocess.Popen(['tasklist'], stdout=subprocess.PIPE, 
              shell=False, creationflags = CREATE_NO_WINDOW)
    reply = str(command.communicate()[0]).split('Ko')
    for item in reply:
        if 'PING.EXE' in item:
            print(item.split(' ')[0][4:]+' '+item.split(' ')[22])
        if item.split(' ')[22] != previous_id:
            previous_id = item.split(' ')[22]
            print('New ping detected, system is healthy')
        else:
            print('Same ping active for 10 seconds, killing')
            command = subprocess.Popen(['taskkill','/f','/im','PING.EXE'], stdout=subprocess.PIPE, shell=False, creationflags = CREATE_NO_WINDOW)
            err_log=open('errors.txt','w')
    time.sleep(10)

this runs in parallel and there is very little chance of both processes hanging at the same time. all you need to do is catch any errors resulting from the loss of the pipe in your main script.

python read() from stdout much slower than reading line by line (slurping?)

4 Answers4

Linked