13

Inside a bash script, I set an environment variable to contain a string of 1 million characters. I do so like this:

export LG=XXXXXXX # ... 1 million X's

Immediately after this, I am able to echo it back without a problem, i.e.

echo $LG

However, any other unrelated commands that I attempt to run after this inside the script fail with the "Argument list too long" error. For example:

cat randomfile.txt
/bin/cat: Argument list too long

I have read through other posts that suggest using xargs to resolve such an issue, but I have not been successful. If I use any command other than echo then I get the "Argument list too long" error even if I don't actually use the $LG variable after I set it. Of course I would like to use the $LG variable, but the error occurs even if I do not use it after it is set.

Any tips would be greatly appreciated, thanks!

Edit:

The overall problem I am trying to solve is something like this:

I have a text file that I need to keep as small as possible (i.e. a few MBs). This text file contains a set of messages that are encapsulated inside a specific network protocol (i.e. header, length of message, the message itself). The message itself can be a string of characters with a length of 1 million or more. So to keep the original file size small, instead of having multiple copies of the large message inside the file, I use a mapping. I.e. if I see the letter A in the message field, I then use sed to find and replace A with 1 million X's. Like this:

cat file.txt | sed "s/A/$LG/g"  # Replace A with 1 million X's

I will eventually be running this inside a (very slow) simulator, so I need this operation to complete in as few cycles as possible. In other words, a utility like awk that uses a loop with a trip count of 1 million to dynamically generate 1 million X's would be too slow. That is why I thought the environment variable solution would be best.

Ivan Stalev
  • 303
  • 1
  • 3
  • 11
  • 2
    What's the purpose of this variable? It's noteworthy that only **environment** variables use up this limited pool of space; if you don't need to export this variable to the environment, and can keep it local to your shell, then this problem becomes quickly moot. – Charles Duffy Mar 04 '15 at 21:39
  • 1
    (...though, on that point -- if you don't need to export this to the environment, you should pick a different name for it; all-uppercase names are reserved by convention for environment variables and shell builtins). – Charles Duffy Mar 04 '15 at 21:40
  • I added more details in my original question and explained why I believe I will need to use an environment variable. Thank you for the feedback! – Ivan Stalev Mar 04 '15 at 21:58
  • 2
    The explanation does not make it clear to me why you would need an environment variable for this. `sed "s/A/$LG/g"` does not look in the environment for the value of `LG`; it just substitutes a regular shell variable; you can set `LG=...` without any `export` and still use it in this way. (Mind you, you might get in trouble for that sed command being too large to store on the command line too, but that's a problem you can solve by passing the script to run to sed via a file descriptor). – Charles Duffy Mar 04 '15 at 22:01
  • 2
    I've updated my answer adding a section showing a suggested method for avoiding relying on either the command line _or_ the environment for this value. – Charles Duffy Mar 04 '15 at 22:05
  • Ah, I see what you mean now! That partially fixed the problem. Now I get the "Argument list too long" error when piping the file to sed. I.e. `/bin/sed: Argument list too long`. I suppose this is now where I should use xargs? Thank you! – Ivan Stalev Mar 04 '15 at 22:05
  • `xargs` works by splitting a long list of arguments down into many invocations. That works when you have lots of small arguments; it does you no good whatsoever when you have one huge one. – Charles Duffy Mar 04 '15 at 22:07
  • 2
    (BTW, this is the part where I complain about the prevalence of cargo-cult programming in shell -- in this case, copying `export` without knowing what it does and when and where to use it -- and, just as importantly, where and when not to). – Charles Duffy Mar 04 '15 at 22:09
  • Thank you for pointing that out! That was a rookie mistake on my part. – Ivan Stalev Mar 04 '15 at 22:27

1 Answers1

15

Command-line arguments and environment variables both come out of the same pool of space. Set environment variables too long, and you no longer have space for command-line arguments -- and even xargs, which breaks command line invocations down into smaller groupings to fit inside the pool where possible, can't operate when that pool is completely full.

So: Don't do that. For instance, you might store your data in a file, and export the path to that file in the environment.


By the way -- the reason echo works is that it's built into your shell. Thus,

echo "$LG"

...doesn't need to start an external process, so the limits on argument list length and environment size at process startup time don't apply.

On the other hand, if you ran

/bin/echo "$LG"

...then you'd see the problem again.


Given the explanation edited into the question as to what you're actually trying to accomplish, let me suggest an approach which requires neither environment space nor command-line space:

#!/bin/bash
#      ^-- also consider ksh; faster than bash, but also supports <()
#          /bin/sh is not usable here, as POSIX sh does not specify <().

lg=... ## DO NOT USE export HERE!
sed -f <(printf '%s\n' "s/A/$lg/g")
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
  • I see now. I think you meant `sed -f <(printf '%s\n' "s/A/$lg/g") fileOfMessages.txt` ? I think that seems to do the trick. Thank you so much! – Ivan Stalev Mar 04 '15 at 22:24
  • 1
    Glad to hear this worked! Feel free to mark the answer accepted. (As for `fileOfMessages.txt`, you didn't specify in your question whether you were feeding input via a stream on stdin or a named argument, so I intentionally kept the answer agnostic). – Charles Duffy Mar 04 '15 at 22:45
  • Is it possible to know how large the limit is? – lindhe Jan 09 '20 at 09:14
  • 3
    @lindhe, `getconf ARG_MAX` shows the total pool size. – Charles Duffy Jan 09 '20 at 12:17