0

I'm trying to invoke tar command via subprocess call from Python.The challenge I have is there are a lot files that get passed on to tar which is causing the command to throw the error Argument list too long: '/bin/sh'

The command I'm running is below

subprocess.call(f"ulimit -s 99999999; tar -cz -f {output_file} {file_list}", cwd=source_dir, shell=True)

To try to overcome the error, I added ulimit which doesn't seem to help. The OS I am running this on is Ubuntu-20.04 & Pyhon version is 3.8

Please could I get help to solve this problem.

usert4jju7
  • 1,653
  • 3
  • 27
  • 59
  • Try a level of indirection and put the list in a file? https://stackoverflow.com/questions/8033857/tar-archiving-that-takes-input-from-a-list-of-files (maybe called `mlylist.txt` and add `-T mylist.txt` to the tar command) – doctorlove Nov 30 '21 at 11:16
  • You can't control the maximum length of the command line, unless you recompile the sources. BTW, `ulimit -s` sets the maximum stack size. – user1934428 Nov 30 '21 at 11:21
  • Aside from the clever idea of doctorlove: Can you maybe copy the files to some temporary directory (which you erase afterwards)? Then you just pass to `tar` the directory name. – user1934428 Nov 30 '21 at 11:24
  • 1
    Please do not multi-post across stacks (https://unix.stackexchange.com/q/679620/117549) – Jeff Schaller Nov 30 '21 at 15:13

1 Answers1

5

ulimit does nothing to lift the kernel constant ARG_MAX which is what you are bumping into here. In fact, the only way to increase it is typically to recompile your kernel.

If your tar supports --files-from -, use that.

subprocess.check_call(
    ['tar', '-cz', '-f', output_file, '--files-from', '-'],
    input='\n'.join(file_list), cwd=source_dir)

I obviously made assumptions about the contents of file_list (in particular, this will break if you have files whose name contains a newline character). Notice also how I avoid shell=True by passing in the command as a list of strings.

Of course, a much better solution for this use case is to use the Python tarfile module to create the tar file; this entirely avoids the need to transmit the list of file names across a process boundary.

import tarfile

with tarfile.open(output_file, "x:gz") as tar:
    for name in file_list:
        tar.add(name)

The "x:gz" mode of creation triggers an exception if the file already exists (use "w:gz" to simply overwrite).

tripleee
  • 175,061
  • 34
  • 275
  • 318