10

With os.path.expandvars I can expand environment variables in a string, but with the caveat: "Malformed variable names and references to non-existing variables are left unchanged" (emphasis mine). And besides, os.path.expandvars expands escaped \$ too.

I would like to expand the variables in a bash-like fashion, at least in these two points. Compare:

import os.environ
import os.path
os.environ['MyVar'] = 'my_var'
if 'unknown' in os.environ:
  del os.environ['unknown']
print(os.path.expandvars("$MyVar$unknown\$MyVar"))

which gives my_var$unknown\my_var with:

unset unknown
MyVar=my_var
echo $MyVar$unknown\$MyVar

which gives my_var$MyVar, and this is what I want.

Jellby
  • 2,360
  • 3
  • 27
  • 56

7 Answers7

5

The following implementation maintain full compatibility with os.path.expandvars, yet allows a greater flexibility through optional parameters:

import os
import re

def expandvars(path, default=None, skip_escaped=False):
    """Expand environment variables of form $var and ${var}.
       If parameter 'skip_escaped' is True, all escaped variable references
       (i.e. preceded by backslashes) are skipped.
       Unknown variables are set to 'default'. If 'default' is None,
       they are left unchanged.
    """
    def replace_var(m):
        return os.environ.get(m.group(2) or m.group(1), m.group(0) if default is None else default)
    reVar = (r'(?<!\\)' if skip_escaped else '') + r'\$(\w+|\{([^}]*)\})'
    return re.sub(reVar, replace_var, path)

Below are some invocation examples:

>>> expandvars("$SHELL$unknown\$SHELL")
'/bin/bash$unknown\\/bin/bash'

>>> expandvars("$SHELL$unknown\$SHELL", '')
'/bin/bash\\/bin/bash'

>>> expandvars("$SHELL$unknown\$SHELL", '', True)
'/bin/bash\\$SHELL'
davidedb
  • 867
  • 5
  • 12
  • 1
    I gave this a test, and it's not 100% equivalent to what you would get from a bash expansion. In bash, "\$IGNORE" would consume the backslash and return "$IGNORE". Whereas this python implementation would leave it as "\$IGNORE". Just pointing this out in case someone is looking for 1-to-1 bash expansion – jdi Mar 06 '19 at 03:40
4

Try this:

re.sub('\$[A-Za-z_][A-Za-z0-9_]*', '', os.path.expandvars(path))

The regular expression should match any valid variable name, as per this answer, and every match will be substituted with the empty string.

Edit: if you don't want to replace escaped vars (i.e. \$VAR), use a negative lookbehind assertion in the regex:

re.sub(r'(?<!\\)\$[A-Za-z_][A-Za-z0-9_]*', '', os.path.expandvars(path))

(which says the match should not be preceded by \).

Edit 2: let's make this a function:

def expandvars2(path):
    return re.sub(r'(?<!\\)\$[A-Za-z_][A-Za-z0-9_]*', '', os.path.expandvars(path))

check the result:

>>> print(expandvars2('$TERM$FOO\$BAR'))
xterm-256color\$BAR

the variable $TERM gets expanded to its value, the nonexisting variable $FOO is expanded to the empty string, and \$BAR is not touched.

Community
  • 1
  • 1
fferri
  • 18,285
  • 5
  • 46
  • 95
  • I was going to say that this would expand escaped `\$` too, which I don't want, but then i realized that `os.path.expandvars` does the same. Maybe I have to modify the question... – Jellby Jun 09 '15 at 14:55
  • I do not follow. `expandvars` already replaces $VAR with its value if VAR is an existing env var. the result can only contain more occurrences of $VAR if VAR is not an env variable, which you said you wanted to replace with the empty string, like bash does. – fferri Jun 09 '15 at 14:58
  • I've modified and expanded the question now, hopefully it's clearer. Your solution would remove `$unknown` from the result, but leave `\my_var` where I want `$MyVar` (unexpanded, because the dollar was escaped). – Jellby Jun 09 '15 at 15:08
  • But now my problem is that `os.path.expandvars` already expands the `\$MyVar` I don't want expanded! (and the regexp wouldn't work if it were the backslash what's escaped, as in `\\$unknown`). – Jellby Jun 09 '15 at 15:19
  • which python version are you using? on my machine `print(os.path.expandvars("$MyVar$unknown\$MyVar"))` prints `my_var$unknown\$MyVar` (check my updated answer) – fferri Jun 09 '15 at 15:27
  • this solution is very error prone... once it won't work in various conditions like: `${VAR}`, `${VAR[0]}`, `${VAR:-abc}` etc... – Jason Hu Jun 09 '15 at 15:28
  • @HuStmpHrrr yeah the grammar for matching all possible *bashisms* is more complex than that, but that's life =D – fferri Jun 09 '15 at 15:29
  • I'm using Python 2.7.6 (Ubuntu 14.04). Try `\$TERM` instead of `\$BAR`. – Jellby Jun 09 '15 at 15:34
  • i would suggest use a subprocess to pass it to bash to handle it. – Jason Hu Jun 09 '15 at 15:34
1

The alternative solution - as pointed out by @HuStmpHrrr - is that you let bash evaluate your string, so that you don't have to replicate all the wanted bash functionality in python.

Not as efficient as the other solution I gave, but it is very simple, which is also a nice feature :)

>>> from subprocess import check_output
>>> s = '$TERM$FOO\$TERM'
>>> check_output(["bash","-c","echo \"{}\"".format(s)])
b'xterm-256color$TERM\n'

P.S. beware of escaping of " and \: you may want to replace \ with \\ and " with \" in s before calling check_output

fferri
  • 18,285
  • 5
  • 46
  • 95
  • I think this is way better. Regular expressions are hard to maintain and debug. I doubt anybody would really feel the inefficiency of spawning a process calling the shell etc. How many times one would be calling (with different `s`!!) this? If you're calling with same `s` over and over, caching is the solution, not regular expressions. – Davide Jan 17 '17 at 16:29
1

Here's a solution that uses the original expandvars logic: Temporarily replace os.environ with a proxy object that makes unknown variables empty strings. Note that a defaultdict wouldn't work because os.environ

For your escape issue, you can replace r'\$' with some value that is guaranteed not to be in the string and will not be expanded, then replace it back.

class EnvironProxy(object):
    __slots__ = ('_original_environ',)

    def __init__(self):
        self._original_environ = os.environ

    def __enter__(self):
        self._original_environ = os.environ
        os.environ = self
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        os.environ = self._original_environ

    def __getitem__(self, item):
        try:
            return self._original_environ[item]
        except KeyError:
            return ''


def expandvars(path):
    replacer = '\0'  # NUL shouldn't be in a file path anyways.
    while replacer in path:
        replacer *= 2

    path = path.replace('\\$', replacer)

    with EnvironProxy():
        return os.path.expandvars(path).replace(replacer, '$')
Artyer
  • 31,034
  • 3
  • 47
  • 75
1

There is a pip package called expandvars which does exactly that.

pip3 install expandvars
from expandvars import expandvars

print(expandvars("$PATH:${HOME:?}/bin:${SOME_UNDEFINED_PATH:-/default/path}"))
# /bin:/sbin:/usr/bin:/usr/sbin:/home/you/bin:/default/path

It has the benefit of implementing default value syntax (i.e., ${VARNAME:-default}).

slhck
  • 36,575
  • 28
  • 148
  • 201
0

I have run across the same issue, but I would propose a different and very simple approach.

If we look at the basic meaning of "escape character" (as they started in printer devices), the purpose is to tell the device "do something different with whatever comes next". It is a sort of clutch. In our particular case, the only problem we have is when we have the two characters '\' and '$' in a sequence.

Unfortunately, we do not have control of the standard os.path.expandvars, so that the string is passed lock, stock and barrel. What we can do, however, is to fool the function so that it fails to recognize the '$' in that case! The best way is to replace the $ with some arbitrary "entity" and then to transform it back.

def expandvars(value):
    """
    Expand the env variables in a string, respecting the escape sequence \$
    """
    DOLLAR = r"\&#36;"
    escaped = value.replace(r"\$", r"\%s" % DOLLAR)
    return os.path.expandvars(escaped).replace(DOLLAR, "$")

I used the HTML entity, but any reasonably improbable sequence would do (a random sequence might be even better). We might imagine cases where this method would have an unwanted side effect, but they should be so unlikely as to be negligible.

fralau
  • 3,279
  • 3
  • 28
  • 41
  • To ensure no collision, you could do something like: `if DOLLAR in value: DOLLAR = DOLLAR + '\0' * len(value)`. – Artyer Oct 22 '17 at 22:19
  • Thanks a lot! Now that you point this out, this seems obvious. My only question is the rationale behind `len(value)`. Would it be preferable to another method, e.g. looping the test (if DOLLAR in value) and progressively adding fillers? – fralau Oct 23 '17 at 07:01
  • A problem with this approach (and probably with some of the others) is it will consider a backslash that is itself escaped; for example, `"\\$VARIABLE"` should still expand `$VARIABLE`. – jamesdlin Jul 03 '22 at 23:13
0

I was unhappy with the various answers, needing a little more sophistication to handle more edge cases such as arbitrary numbers of backslashes and ${} style variables, but not wanting to pay the cost of a bash eval. Here is my regex based solution:

#!/bin/python

import re
import os

def expandvars(data,environ=os.environ):
    out = ""
    regex = r'''
             ( (?:.*?(?<!\\))                   # Match non-variable ending in non-slash
               (?:\\\\)* )                      # Match 0 or even number of backslash
             (?:$|\$ (?: (\w+)|\{(\w+)\} ) )    # Match variable or END
        '''

    for m in re.finditer(regex, data, re.VERBOSE|re.DOTALL):
        this = re.sub(r'\\(.)',lambda x: x.group(1),m.group(1))
        v = m.group(2) if m.group(2) else m.group(3)
        if v and v in environ:
            this += environ[v]
        out += this
    return out


# Replace with os.environ as desired
envars = { "foo":"bar", "baz":"$Baz" }

tests = { r"foo": r"foo",
          r"$foo": r"bar",
          r"$$": r"$$",                 # This could be considered a bug
          r"$$foo": r"$bar",            # This could be considered a bug
          r"\n$foo\r": r"nbarr",        # This could be considered a bug
          r"$bar": r"",
          r"$baz": r"$Baz",
          r"bar$foo": r"barbar",
          r"$foo$foo": r"barbar",
          r"$foobar": r"",
          r"$foo bar": r"bar bar",
          r"$foo-Bar": r"bar-Bar",
          r"$foo_Bar": r"",
          r"${foo}bar": r"barbar",
          r"baz${foo}bar": r"bazbarbar",
          r"foo\$baz": r"foo$baz",
          r"foo\\$baz": r"foo\$Baz",
          r"\$baz": r"$baz",
          r"\\$foo": r"\bar",
          r"\\\$foo": r"\$foo",
          r"\\\\$foo": r"\\bar",
          r"\\\\\$foo": r"\\$foo" }

for t,v in tests.iteritems():
    g = expandvars(t,envars)
    if v != g:
        print "%s -> '%s' != '%s'"%(t,g,v)
        print "\n\n"
Seth Robertson
  • 30,608
  • 7
  • 64
  • 57
  • Unfortunately, a proper treatment requires not only dealing with escaped dollar signs, but also quoting issues. For instance, the syntax `$var1" "$var2' '$var3'$var4'` should expand such that the quotes disappear. And `$var4` should be left verbatim. – Kaz Oct 04 '19 at 00:49
  • What we want is an interface to the `wordexp` POSIX C library function, more or less. – Kaz Oct 04 '19 at 00:50