2

I have configuration files where each line contains assignments separated by semi-colons. Something like this, which mimics normal shell assignments :

VAR1="1"  ;  VAR2="2"
VAR1="3"  ;  VAR2="4"

Each line contains the same variables, and is intended to be processed individually. These configuration files are all under the system administrator control, so using eval to perform the assignment is not too bad for now. But I would like to extend this to per-user config files, and I am looking for better ideas.

I am able to parse a line, split it in chunks using ; as a separator (in a way that unfortunately does not allow escaped ; to be found inside the values, but I can live with that), identify the assignment (valid variable name followed by = sign), and extract the right part of the assignment (in raw form, with quoting and spacing as part of the value). But then I have a problem.

Say I have variable value which, after the parsing, contains what would result from a "manual" assignment like this :

value="\"Arbitrary value \\\" containing escaped quote inside quotes\""

In other words, the value is this (if I echo "$value") :

"Arbitrary value \" containing escaped quote inside quotes"

I want to transform that value without using eval or another method that could cause arbitrary code execution (and therefore code injection risks) so that it becomes this:

Arbitrary value " containing escaped quote inside quotes

I could, I guess, just look for and remove leading and trailing quotes, but this does not handle all cases of valid shell quoting. If there is a way to retain safe expansions while preventing code execution, that is a plus, but I am not getting my hopes up with this one. I would also prefer a Bash-only solution (no external program called), but this is a preference, not a hard requirement.

If I solve that issue, I know how to perform the indirect assignment safely, and I do not need detailed code on how to read files, perform regex matching, etc. It is only this critical step I am missing, and I hope there is a way that does not involve writing a parser.

Fred
  • 6,590
  • 9
  • 20
  • 1
    Tangential, but worth pointing out: shell assignments do not need to be separated by a semicolon. – kojiro Feb 19 '17 at 13:47
  • @kojiro This is true, but I think allowing that in my config files would make parsing more difficult : due to the kind of "dumb" parsing I am doing (which ignores quotes), I cannot differentiate between whitespace inside a quoted value, and trailing whitespace. Same reason I cannot, with my current approach, embed semi-colons in the values. – Fred Feb 19 '17 at 15:28

2 Answers2

2

One very easy solution is to use jq. Since "foo is a string \" that contains a quote" is valid json, it handles it natively:

$ value="\"Arbitrary value \\\" containing escaped quote inside quotes\""
$ jq -r . <<< "$value"
Arbitrary value " containing escaped quote inside quotes

Yes, it's not native sh or bash, but it's a quick and easy solution. Furthermore, jq has methods to output the result back to a format that can be read in by another shell:

$ jq -r '.|@sh' <<< "$value"
'Arbitrary value " containing escaped quote inside quotes'
kojiro
  • 74,557
  • 19
  • 143
  • 201
  • 1
    Very simple and readable (not having tested it nor knowing `jq` I cannot vouch for its correctness), but it requires a process/pipe for each and every variable assignment, which I would be willing to trade for a reasonable amount of additional complexity with a Bash-only solution (this is "library" code I will reuse, it does not need to be a one-liner). `jq` is a utility that (I think) is not part of many Linux/UNIX systems by default (not my systems a least). Given a choice, I would prefer a `sed` or `awk` solution for that reason, even if not as pleasantly readable as yours. – Fred Feb 19 '17 at 15:08
  • I understand. Your later arguments aside, it's worth noting that you could construct your solution with a single jq command to output all the variable assignments in the `@sh` form at once, to avoid having a command or process substitution for every assignment. – kojiro Feb 19 '17 at 15:21
  • That would be better, but I would still need to do it once per line... Unless I use a solution which takes the whole file as input, and spits out an array (one element being all the assignments for one line of the config file). That would be a little more complex, but better on the long run. Good idea. – Fred Feb 19 '17 at 15:25
1

To complement kojiro's helpful jq solution with a pure bash solution (a POSIX-compliant implementation is also possible):

# Sample value, resulting in the following value, *including* the double quotes:
#     "Arbitrary value \" containing escaped quote inside quotes"
# Note: This is effectively the same assignment as in the question, except
#       with single quotes, which makes it easier to parse visually.
value='"Arbitrary value \" containing escaped quote inside quotes"'    

# Strip enclosing " instances, if present.
[[ $value =~ ^\"(.*)\"$ ]] && value=${BASH_REMATCH[1]}

# Use `read` - without -r - to perform interpretation of \-prefixed
# escape sequences, and save the result back to $value.
IFS= read value <<<"$value"

Running printf '%s\n' "$value" afterward yields:

Arbitrary value " containing escaped quote inside quotes

Note:

  • If $value contained a \ followed by an actual newline (probably not a concern with configuration-file entries), that newline would be removed.

  • For any other \-prefixed character - not just \" - (only) the \ is removed.

  • No expansions of any kind are performed, and other string formats that the shell supports aren't supported (such as automatic concatenation of adjacent strings "ab""cd" to yield abcd).

    • See this answer of mine for a safe templating solution that restricts expansions to embedded variable references (prevents command substitutions).

Optional background information

read - without the -r option - interprets \-based sequences only in the sense that, with the exception discussed below, it removes the \ before a \<char> sequence; it does not perform expansion of control-character escape sequences such as \n.

The only expansion of sorts read does perform is if a \ is followed by an actual newline (LF character), in which the newline is removed too, which points to the main purpose of \-escaping for read: line continuation.
From the POSIX spec:

By default, unless the -r option is specified, <backslash> shall act as an escape character. An unescaped <backslash> shall preserve the literal value of the following character, with the exception of a <newline>. If a <newline> follows the <backslash>, the read utility shall interpret this as line continuation. The <backslash> and <newline> shall be removed before splitting the input into fields. All other unescaped <backslash> characters shall be removed after splitting the input into fields.

The -r option turns interpretation of \ sequences off, which is the desired behavior in the vast majority of cases.
Therefore, it is advisable to use -r routinely, unless you explicitly need processing of \ sequences.

Community
  • 1
  • 1
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Thanks. This kind of quote stripping is the one I am afraid I will have to fall back on (and am hoping to work around). It is not perfect (e.g. if the string to be interpreted is `"first part""second part"`, the result will not be what normal shell processing would provide, but it is better than launching an external command for each assignment (in the sense that I can live with the tradeoff on expressivity given my purpose). The `read` trick I had not thought about, I always use `-r`, I would have to play with it to see if I prefer with or without. Thanks! – Fred Feb 19 '17 at 15:18
  • What I would really want is some kind of `safeval`, which does all safe expansions (most notably nothing with `$()`), but performs simple variable substitution and `${}` expansions (that do not recursively involve `$()` or any other construct with potential side-effects). – Fred Feb 19 '17 at 15:22
  • @Fred: Please see my update - I've added a link to answer that may provide what you're looking for. Generally, though, I wonder whether it's a good idea to store shell-specific constructs configuration files. Yes, using `-r` almost always makes sense - truthfully, this is the first real-world use case where _not_ using it is helpful. – mklement0 Feb 19 '17 at 15:42
  • Could you explicitly do the eval in rbash on a readonly or tmp fs? – kojiro Feb 19 '17 at 15:42
  • @mklement0 Thanks for the reference, this may be the beginning of something nice for me. Let me rephrase the idea to make sure I understand : first, replace any `(`, `[`, and single quote with bytes that are not usually found in text files, then perform the `eval` (which is then safe), and finally revert the replacements. Is there any reason this would not work using shell `${...}` replacements instead of `tr`? Could the replacement be performed only when `$` is the preceding character, so that for instance assigning an array would remain possible? – Fred Feb 19 '17 at 19:10
  • @Fred: Your paraphrasing of the approach used in the `expandVars` function in my linked answer sounds correct. In principle you should be able to use parameter expansion in lieu of `tr`, but it'll be cumbersome. I'm not sure what you mean re arrays, but the current function does expand references to arrays a whole. If you have more questions, I suggest you create a _new question_ (but feel free to ping me). – mklement0 Feb 19 '17 at 19:31
  • 1
    @mklement0 I will whip up something and make it work, then probably post another question to validate the solution is sound. I will probably go with expansions despite a few more lines of code, as I expect avoiding the call an external program to be worth it to avoid a dependency an gain a bit of speed (code I will code once and reuse a lot). Thanks! – Fred Feb 19 '17 at 19:37