Please scroll down to the end of this answer for the solution I recommend for your specific problem. There's a bit of background here for context and/or future visitors grappling with other "argument list too long" errors.
The exec()
system call has a size limit; you cannot pass more than ARG_MAX
bytes as arguments to a process, where this system constant's value can usually be queried with the getconf ARG_MAX
command on modern systems.
import glob
import subprocess
arg_max = subprocess.run(['getconf', 'ARG_MAX'],
text=True, check=True, capture_output=True
).stdout.strip()
arg_max = int(arg_max)
cmd = ['sed', '-i', '-e', 's/#/pau/g']
files = glob.glob('label_POS/label_phone_align/dump/*')
while files:
base = sum(len(x) for x in cmd) + len(cmd)
for l in range(len(files)):
base += 1 + len(files[l])
if base > arg_max:
l -= 1
break
subprocess.run(cmd + files[0:l+1], check=True)
files = files[l+1:]
Of course, the xargs
command already does exactly this for you.
import subprocess
import glob
subprocess.run(
['xargs', '-r', '-0', 'sed', '-i', '-e', 's/#/pau/g'],
input=b'\0'.join([x.encode() for x in glob.glob('label_POS/label_phone_align/dump/*') + ['']]),
check=True)
Simply removing the long path might be enough in you case, though. You are repeating label_POS/label_phone_align/dump/
in front of every file name in the argument array.
import glob
import subprocess
import os
path = 'label_POS/label_phone_align/dump'
files = [os.path.basename(file)
for file in glob.glob(os.path.join(path, '*'))]
subprocess.run(
['sed', '-i', '-e', 's/#/pau/g', *files],
cwd=path, check=True)
Eventually, perhaps prefer a pure Python solution.
import glob
import fileinput
for line in fileinput.input(glob.glob('label_POS/label_phone_align/dump/*'), inplace=True):
print(line.replace('#', 'pau'))