0

Arguments file

On args.txt, there are long series of arguments for different call of an executable. Each line contain the arguments for a single call of the executable. One line of args.txt looks like

2 1 -A 7 -B true -C 0.0035 -D /path/to/somewhere ....

The line start by 2 1 as the first two arguments to be given to the executable are "unnamed" (do not come with a flag).

First try

I first tried

i=5
./myexec `sed "${i}q;d" args.txt`

it works most of the time. However, for some lines, the arguments are too long and I receive Error: Command Line Too long as I am overpassing getconf ARG_MAX. Note the software does not allow for specifying arguments other than through the command line.

Second try

So I tried

sed "${i}q;d" args.txt | xargs ./myexec

This second try causes the executable to return nothing.

Questions

  • Am I doing something wrong with sed "${i}q;d" args.txt | xargs ./myexec?
  • Once I fix the second try, will I encounter the same issue (Command Line Too long) as for the first try?
  • Could there be a quotation issue which causes ./myexec to consider the long string as a single argument or something similar?
  • Would you suggest me trying another way of feeding the arguments to myexec?

FYI

I am on Mac OS X 10.11.3 with Terminal 2.6.1

Remi.b
  • 17,389
  • 28
  • 87
  • 168
  • 1
    If it's too long to fit into `ARG_MAX`, it'll still be too long when you're using `xargs` -- xargs will just split it into multiple invocations of your executable, each of them given only a subset of the arguments you want (and thus, neither being correct). – Charles Duffy Sep 22 '16 at 21:00
  • @CharlesDuffy Oh...ok. Is there a way around this problem? – Remi.b Sep 22 '16 at 21:01
  • Audit your set of defined environment variables, and unset any you can do without -- they use the same pool of space used for command-line arguments. Also adding an answer. – Charles Duffy Sep 22 '16 at 21:03
  • ...otherwise, it's a matter of looking at whether the command you're running allows a configuration file or similar alternate source of options/arguments/&c. to be specified. – Charles Duffy Sep 22 '16 at 21:06
  • BTW, using `sed` here is quite inefficient. Is there a reason you aren't following the practices given in [BashFAQ #1](http://mywiki.wooledge.org/BashFAQ/001)? – Charles Duffy Sep 22 '16 at 21:09
  • ...also, you're going to hit the bugs discussed in [BashFAQ #50](http://mywiki.wooledge.org/BashFAQ/050) if you're dealing with arguments containing whitespace, quotes, etc. See http://stackoverflow.com/questions/39585662/bash-shell-expand-arguments-with-spaces-from-variable – Charles Duffy Sep 22 '16 at 21:09
  • @CharlesDuffy yes, I am using `sed` because the process will be happening in parallel. There will be a file of commands of the kind `sed "${i}q;d" args.txt | xargs ./exec` that will be given to GNU parallel (parallel :::: commands.sh). – Remi.b Sep 22 '16 at 21:13
  • I don't see how doing things in parallel justifies the inefficiency of running `sed` once per line. `while IFS= read -r line; do something_with "$line" & done` will run every line's command in parallel while following BashFAQ #1. – Charles Duffy Sep 22 '16 at 21:16
  • Similarly, GNU parallel itself can run a subprocess for each line of an input file (not that I advise its use, ever, for anything -- but then, that's largely me having an allergy to perl). – Charles Duffy Sep 22 '16 at 21:17
  • Sure, that's right....I am just used to parallel and I don't know how to limit the memory used or the number of cores used with the standard `&`. Solution probably exist though. Time for the executation of `sed` is quite negligible in comparison to the time that `exec` will take. Thanks for the advice. – Remi.b Sep 22 '16 at 21:24
  • Honestly, there are things parallel does that aren't easy to do without it -- I'm just deeply suspicious because it's very dense code, and I don't like trusting things to be correct when they have a lot of potentially-conflicting options without the interactions between them clearly specified. If it did fewer things, I might trust it more. :) – Charles Duffy Sep 22 '16 at 21:26
  • 3
    How long is the argument list? `ARG_MAX` is 262K on OS X. What are you doing that needs more arguments than this? – Barmar Sep 22 '16 at 21:27
  • @Remi.b, ...btw, since `exec` is a shell builtin, you might find a different name to act as a standin for your unnamed command, just to reduce potential reader confusion. – Charles Duffy Sep 22 '16 at 21:28
  • @Barmar I am performing population genetics simulations and am in need to specify details of the genetic map for the simulation. The argument list is really that long, it is not a mistake such as an unwished brace expansion that causing this error. – Remi.b Sep 22 '16 at 21:31
  • @Remi.b, ...and this popular software, written with the expectation of really long command lines, doesn't let you specify arguments in any way not subject to such a limitation? I'd call that a bug. – Charles Duffy Sep 22 '16 at 21:32
  • Is there any way you can change the application to get the information from a file or standard input instead of from command line arguments? – Barmar Sep 22 '16 at 21:32
  • @CharlesDuffy Thanks I fixed the confusing use of the term `exec`. – Remi.b Sep 22 '16 at 21:34
  • No, the software does not allow for another way to specify the arguments unfortunately. – Remi.b Sep 22 '16 at 21:34
  • Really, the approach I would take at this point is to build a patch (allowing an alternate input source) and file it with the maintainer. – Charles Duffy Sep 22 '16 at 21:34
  • @CharlesDuffy Can I ask you to explain what does `build a patch and file it with the maintainer` mean in simple terms? – Remi.b Sep 22 '16 at 21:36
  • @Remi.b, roughly: figure out how the genetics program you're running needs to be changed to allow an alternate configuration specification, make that change, and tell the people who wrote that software it how you did so. – Charles Duffy Sep 22 '16 at 21:37

1 Answers1

2

ARG_MAX is an operating-system constraint on the combined size of command-line arguments and environment variables.

Because it's an operating-system-enforced constraint, you can't bypass it without operating-system-level changes. xargs will split your code into multiple invocations, but each invocation is given only a subset of the arguments desired.

What you can do then, if you can't decrease the length of your argument list (and the program you're running can't read configuration by any means other than the command line), is unset any environment variables you don't need.


If you'd rather fail outright when your command-line argument list is too long to correctly execute, rather than (as xargs does) run two or more invocations each with a subset of the arguments given, I would suggest the following code be used for each line:

args=( )
while IFS= read -r -d '' arg; do
  args+=( "$arg")
done < <(xargs printf '%s\0' <<<"$line")

./yourprog "${args[@]}"
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441