0

Extending this question and answer, I'd like some help exploring some solutions to making this exercise of using source to bring a config file into a Bash file more "safely." I say "more safely" because I recognize it may be impossible to do with 100% safety.

I want to use a config file to set variables and arrays and have some comments throughout. Everything else should be disallowed.

The above Q&A suggested starting a regex line to check for things we want, versus what we don't want, before passing it to source.

For example, the regex could be:

(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)

But I'm looking for help in both refactoring that regex, or considering other pathways to get what we're after in the Bash script below, especially after wondering if this approach is futile in the first place?

Example

This is what the desired config file would look like:

#!/bin/bash
 disks=([0-UUID]=1234567890123 [0-MountPoint]='/some/path/')
disks+=([1-UUID]=4567890123456 [1-MountPoint]='/some/other/path')
# ...
someNumber=1
rsyncExclude=('.*/' '/dev/' '/proc/' '/sys/' '/tmp/' '/mnt/' '/media/' '/lost+found' '.Trash-*/' '[$]RECYCLE.BIN/' '/System Volume Information/' 'pagefile.sys' '/temp/' '/Temp/' '/Adobe/')
remote='this@123.123.123.123'
# there should be nothing in the config more complicated than above

And this is a simplified version of the bash script it will go into, using the example from @Erman in the Q/A linked to above, to do the checking:

#!/bin/bash
configFile='/blah/blah/config.file'

if [[ -f "${configFile}" ]]; then
        # check if the config file contains any commands because that is unexpected and unsafe
        disallowedSyntax="(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)"
        if egrep -q -iv "${disallowedSyntax}" "${configFile}"; then
            printf "%s\n" 'The configuration file is not safe!' >&2 # print to STDERR
            exit 1
        else
            # config file might be okay
            if result=$( bash -n "${configFile}" 2>&1 ); then
                # set up the 'disk' associative array first and then import
                declare -A disks
                source <(awk '/^\s*\w++?=/' "${configFile}")
                # ...
            else
                # config has syntax error
                printf '%s\n' 'The configuration file has a syntax error.' >&2
                exit 1
            fi
        fi
else
    # config file doesn't exist?
    printf '%s\n' "The configuration file doesn't exist." >&2
    exit 1
fi

I imagine below is ideally what we want to be allowed and disallowed as a starting point?

Allowed

# whole numbers only
var=1
var=123

# quoted stuff
var='foo bar'
var="foo bar"

# arrays
var=('foo' 'bar')
var=("foo" "bar")
var=([0-foo]=1 [0-bar]='blah' ...
var+=(...

# vars with underscores, same format as above
foo_bar=1
...
foo_bar+=(...

# and that's it?

Not allowed*

* Not an exhaustive list (and I'm certain I'm missing things) but the idea is to at least disallow anything not quoted (unless it's a number), and then also anything else that would allow unleash_virus to be run:

var=notquoted
...
var=notquoted unleash_virus
var=`unleash_virus`
...
var='foo bar' | unleash_virus
...
var="foo bar"; unleash_virus
var="foo bar" && unleash_virus
var="foo bar $(unleash_virus)"
...
nooblag
  • 678
  • 3
  • 23
  • I'm not an expert in `egrep` regex; but you could generate strings matching the regex by using automated tools. Maybe something suspicious still gets through. – melvio Aug 15 '21 at 17:44

3 Answers3

0

At least one issue you might encounter is the ${configFile} changing between the syntax check and the subsequent sourcing:

# configFile might seem save according to your syntax rules:
if egrep -q -iv "${disallowedSyntax}" "${configFile}"; then
    printf "%s\n" 'The backup configuration file is not safe!' >&2 
    exit 1
else
    if result=$( bash -n "${configFile}" 2>&1 ); then
        declare -A disks
        
        # Warning: config file might have changed 
        source "${configFile}"

If you cannot guarantee that the contents of the config file remain the same then your regex-check won't help you much.

melvio
  • 762
  • 6
  • 24
  • 1
    That's helpful, thanks, and maybe scope for another question, but for now doing the checking is what we're after. Any ideas? – nooblag Aug 15 '21 at 17:34
  • You are welcome @nooblag. To get feedback on the regex only, the question might get more specific answers if you remove this part: `"What do you think? Am I doing this all wrong? "` ;-). – melvio Aug 15 '21 at 17:38
0

Since you wanted specific feedback on the regex; Here is a variable assignment with a quoted value that is not allowed by the regex:

some_regex_config='\s'

Note that this was the regex of time of answering:

(^\s*#|^\s*$|^\s*[a-z_][^[:space:]]*=[^;&\(\`]*$|[a-z_][^[:space:]]*\+?=\([^;&\(\`]*\)$)
melvio
  • 762
  • 6
  • 24
0

Here's a start, thanks to @SasaKanjuh.

Instead of checking for disallowed syntax, we could use awk to only pass parts of the config file that match formatting we expect to eval, and nothing else.

For example, we expect that variables must have some kind of quoting (unless they solely contain a number); arrays start and end with () as usual; and everything else should be ignored...

Here's the awk line that does this:

awk '/^\s*\w+\+?=(\(|[0-9]+$|["'\''][^0-9]+)/ && !/(\$\(|&&|;|\||`)/ { print gensub("(.*[\"'\''\\)]).*", "\\1", 1) }' ./example.conf
  • first part captures line starting with variable name, until =
  • then after = sign, it is looking for (, numerical value, or ' or " followed by a string
  • second part excludes lines with $(), &&, ; and |
  • and gensub captures everything including last occurrence of ' or " or ), ignoring everything after.
#!/bin/bash
configFile='./example.conf'

if [[ -f "${configFile}" ]]; then    
    # config file exists, check if it has OK bash syntax
    if result=$( bash -n "${configFile}" 2>&1 ); then
        # seems parsable, import the config file
        # filter the contents using `awk` first so we're only accepting vars formatted like this:
            # var=1
            # var='foo bar'
            # var="foo bar"
            # var=('array' 'etc')
            # var+=('and' "so on")
        # and everything else should be ignored:
            # var=unquoted
            # var='foo bar' | unleash_virus
            # var='foo bar'; unleash_virus
            # var='foo' && unleash_virus
            # var=$(unleash_virus)
            # var="$(unleash_virus)"
            # ...etc
        if config=$(awk '/^\s*\w+\+?=(\(|[0-9]+$|["'\''][^0-9]+)/ && !/(\$\(|&&|;|\||`)/ { print gensub("(.*[\"'\''\\)]).*", "\\1", 1) }' "${configFile}"); then
            # something matched
            # now actually insert the config data into this session by passing it to `eval`
            eval "${config}"
        else
            # no matches from awk
            echo "No config content to work with."
            exit 1
        fi
    else
        # config file didn't pass the `bash -n` test
        echo "Config contains invalid syntax."
        exit 1
    fi
else
    # config file doesn't exist or isn't a file
    echo "There is no config file."
    exit 1
fi
nooblag
  • 678
  • 3
  • 23