4

This code (it’s a part of a shell function) works perfectly:

    output=$(\
            cat "${vim_file}" | \
            sed -rne "${EXTRACT_ENTITIES}" | \
            sed -re "${CLEAR_LEADING_QUOTES}" | \
            sed -re "${NORMALIZE_NAMES}" \
    )

But when I’m trying to insert the word “local” before the assignment…

    local output=$(\
            cat "${vim_file}" | \
            sed -rne "${EXTRACT_ENTITIES}" | \
            sed -re "${CLEAR_LEADING_QUOTES}" | \
            sed -re "${NORMALIZE_NAMES}" \
    )

…I get a strange error:

local: commands.: bad variable name

There are no wrong invisible characters in the code: only tabs making indentations and spaces in the other places. The script begins with “#!/bin/sh”. Inserting the “local” before other variables in the function doesn’t lead to any problem. Replacing “output” (the name of the variable) with another arbitrary string changes nothing. The OS is Linux.

oneastok
  • 323
  • 1
  • 11
  • did you try this piece of code outside function and then trying local? – Jatin Mehrotra Apr 13 '21 at 01:55
  • No, I didn’t. It won’t work outside the function because the other variables that the code is using are also declared as local. But it works well just here but without the word “local”. It’s possible to leave it as is, but I can’t understand what’s happening and how to avoid it in the future. – oneastok Apr 13 '21 at 02:05
  • 2
    I’ve discovered that when the declaration is split in two lines, it works fine: local output↲ output=$(cat … – oneastok Apr 13 '21 at 02:47
  • I’ve solved the issue, in a few minutes there will be the answer. Thank you all guys! – oneastok Apr 13 '21 at 03:01
  • 2
    As an aside, your chain of `sed` scripts should probably be combined. There are situations where this isn't correct, but often, anything which looks like `sed a | sed b | sed c` can be combined into `sed 'a;b;c'`. You'll want to lose the [useless `cat`](https://stackoverflow.com/questions/11710552/useless-use-of-cat), too. – tripleee Apr 13 '21 at 03:32
  • 1
    Why all the backslashes? You don't need to explicitly add them in sane shells (seashells are not sane in this respect). – Jonathan Leffler Apr 13 '21 at 04:03
  • @tripleee, I used a sequence of seds because I wanted to separate the logic of text processing (that is quite complex — totally more than 20 lines of sed script with multiple branches) and the test cases. You’re also right about the redundant “cat” — I used it for consistence and readability that is perhaps not a best idea. – oneastok Apr 13 '21 at 04:18
  • @Jonathan Leffler, thanks a lot! I didn’t know that inside the braces the backslashes are unnecessary. I’ve removed them and everything still works as expected. – oneastok Apr 13 '21 at 04:25
  • 1
    It's more a question, @oneastok, that when the shell sees an (unescaped, unquoted) `|` at the end of a line it knows there must be another command to follow it, so it continues to look for a command. You were using backslashes inside `$(…)` command substitution, and those are (round) brackets or parentheses `()`, rather than curly brackets or braces `{}` (and those are both distinct from both (square) brackets `[]`, and 'angle brackets' `<>`). – Jonathan Leffler Apr 13 '21 at 04:54
  • 1
    @oneastok It's not the braces that make backslashes unnecessary at the end of lines, it's the pipes (see [this explanation](https://stackoverflow.com/questions/3871332/how-to-tell-bash-that-the-line-continues-on-the-next-line/35931689#35931689)). Also, concerning `sed`, you can also use `sed -re "$command1" -e "$command2" -e "$command3" ...` (although you *will* need continuation backslashes to break this up) (and it may not work with things like the first `sed` command, where you're filtering lines). – Gordon Davisson Apr 13 '21 at 04:59
  • @JonathanLeffler, I misused the word “braces” (I meant round brackets, actually), I apologize. You’ve discovered another interesting thing for me — unnecessary backslashes in some cases. Is it POSIX-compatible to use the bar at the end of a line? Or is it a GNU or Bash extension? – oneastok Apr 13 '21 at 05:21
  • 1
    A pipe at the end of the line continues automatically in all the shells derived from the Bourhe shell, which means Korn shell, POSIX shell, Vash, Dash, `zsh`, etc. Only the seashells (meaning shells derived from the C shell) do not do this. – Jonathan Leffler Apr 13 '21 at 05:31
  • @JonathanLeffler, thanks a lot. I’ve known many interesting things today. – oneastok Apr 13 '21 at 05:37
  • I see I had typos from typing too fast on a phone and not reading before submitting. It is the Bourne shell, and I meant 'Bash', not 'Vash' (which doesn't exist AFAIK). – Jonathan Leffler Apr 13 '21 at 06:16
  • @JonathanLeffler, I already thought that Vash is just another “Vash again shell” I don’t know yet :) – oneastok Apr 13 '21 at 06:52

3 Answers3

11

Really short answer: Use more quotes!

local output="$(\
        cat "${vim_file}" | \
        sed -rne "${EXTRACT_ENTITIES}" | \
        sed -re "${CLEAR_LEADING_QUOTES}" | \
        sed -re "${NORMALIZE_NAMES}" \
)"

Longer answer: It's almost always a good idea to double-quote variable references and command substitutions. Double-quoting prevents them from being subject to word splitting and filename wildcard expansion, which is rarely something you want, and can cause confusing problems.

There are situations where it's safe to leave the double-quotes off, but the rules are confusing and hard to remember, and easy to get wrong. This is one of those confusing cases. One of the situations where word splitting and wildcard expansion don't happen (and therefore it's safe to leave the double-quotes off) is on the right-hand side of an assignment:

var=$othervar           # safe to omit double-quotes
var2=$(somecommand)     # also safe
var="$othervar"          # this also works fine
var2="$(somecommand)"    # so does this

Some shells extend this to assignments that're part of a command, like local or export:

export var=$othervar         # *Maybe* ok, depending on the shell
local var2=$(somecommand)    # also *maybe* ok

bash treats these as a type of assignment, so it doesn't do the split-expand thing with the values. But dash treats this more like a regular command (where the arguments do get split-expanded), so if your script is running under dash it can have problems like this.

For example, suppose somecommand prints "export and local are shell commands." Then in dash, local var2=$(somecommand) would expand to:

local var2=export and local are shell commands.

...which would declare the local variables var2 (which gets set to "export"), and, local, are, and shell. It would also try to declare commands. as a local variable, but fail because it's not a legal variable name.

Therefore, use more quotes!

export var="$othervar"         # Safe in all shells
local var2="$(somecommand)"    # also safe

Or separate the declarations (or both!):

export var
var=$othervar         # Safe in all shells, with or without quotes
local var2
var2=$(somecommand)    # also safe, with or without quotes
Gordon Davisson
  • 118,432
  • 16
  • 123
  • 151
  • This is a really brilliant answer. Thank you for your time and for the reminder of using quotes. (I read your answer just after publishing my own.) – oneastok Apr 13 '21 at 04:04
2

The answer was found here: Advanced Bash-Scripting Guide. Chapter 24. Functions

This is a quotation from there:

As Evgeniy Ivanov points out, when declaring and setting a local variable in a single command, apparently the order of operations is to first set the variable, and only afterwards restrict it to local scope.

It means that if a local variable contains a space, then, trying to execute the local command, the shell will take only the first word for the assignment. The rest of the string will be interpreted dependently on the content.

The way the shell interprets the rest content is still a puzzle for me. In my case it tried to perform assignment using arbitrary parts of the files being read. For example, the “commands.” string in the error message was the end of a sentence in one of the files the cat command operated on.

So, there are two ways to solve the problem.

The first one is to split the assignment. I.e. instead of…

local output=$(cat ...

…it must be:

local output
output=$(cat ...

The second approach has been taken from the comments under the question — using surrounding quotes for the entire expression:

local output="$(cat...)"

Summarizing: using shell, we all must always remember about insidious splitting at spaces.

P.S. Read the brilliant explanation from Gordon Davisson.

oneastok
  • 323
  • 1
  • 11
  • Thaks for supplying an answer. However, we are a bit wary of recommending the ABS; perhaps someone can come up with a more canonical and reliable reference. – tripleee Apr 13 '21 at 04:32
  • @tripleee, I took the first link that explained the cause. What’s wrong with the link? Is the site untrusted or the book of Mendel Cooper? (I used to learn Bash by that book) – oneastok Apr 13 '21 at 04:45
  • 1
    I haven't reviewed the link in any detail. The ABS is popular but dubious, sort of like the W3schools of Bash. – tripleee Apr 13 '21 at 05:07
0

Look at the error message: you've provided an invalid variable name:

$ sh
$ foo () { local commands.; commands=5; echo "${commands}"; }
$ foo
sh: local: `commands.': not a valid identifier
5
glenn jackman
  • 238,783
  • 38
  • 220
  • 352