import subprocess
from pathlib import Path
def check_file_wc_count(path: Path, regex: str):
try:
zgrep = subprocess.run(['zgrep', regex, path], check=True, stdout=subprocess.PIPE)
except subprocess.CalledProcessError as e:
return 0
output = subprocess.run(['wc', '-l'], input=zgrep.stdout, capture_output=True, check=True)
return int(output.stdout.decode('utf-8').strip())
When reading large files (which is gzipped, hence the zgrep), I observe large memory usage. Something that (I think) does not normally occur when using the linux utilities on its own. I am guessing it's because of how I am using the subprocess.PIPE
and I am guessing it stores the stdout
of the zgrep
call in a buffer until it is read into the input of the wc
call.
Is this assumption correct and is there a way to avoid this in python?