4

I find it somewhat annoying that I cannot use aliases in GNU Parallel:

alias gi="grep -i"
parallel gi bar ::: foo
/bin/bash: gi: command not found

I had somewhat come to terms with that it is just the way it is. But reading Accessing Associative Arrays in GNU Parallel I am starting to think: Does it really have to be this way?

Is is possible to make a bash function, that collects all of the environment into a function, exports that function and calls GNU Parallel, which will then import the environment in the spawned shell using that function?

So I am not talking about a specialized solution for the gi-alias, but a bash function that will take all aliases/functions/variables (without me having to name them explicitly), package those into a function, that can be activated by GNU Parallel.

Something similar to:

env_parallel() {
  # [... gather all environment/all aliases/all functions into parallel_environment() ...]
  foreach alias in all aliases {
     append alias definition to definition of parallel_environment()
  }
  foreach variable in all variables (including assoc arrays) {
     append variable definition to definition of parallel_environment()
     # Code somewhat similar to https://stackoverflow.com/questions/24977782/accessing-associative-arrays-in-gnu-parallel
  }
  foreach function in all functions {
     append function definition to definition of parallel_environment()
  }

  # make parallel_environment visible to GNU Parallel
  export -f parallel_environment

  # Running parallel_environment will now create an environment with
  # all variables/all aliases/all functions set in current state 
  # (with the exception of the function parallel_environment of course)

  # Inside GNU parallel:
  #    if set parallel_environment(): prepend it to the command to run
  `which parallel` "$@"
}

# Set an example alias
alias fb="echo fubar"
# Set an example variable
BAZ=quux
# Make an example function
myfunc() {
  echo $BAZ
}

# This will record the current environment including the 3 examples
# put it into parallel_environment
# run parallel_environment (to set the environment)
# use the 3 examples
env_parallel parallel_environment\; fb bar {}\; myfunc ::: foo

# It should give the same output as running:
fb bar foo; myfunc
# Outputs:
#   fubar bar foo
#   quux

Progress: This seems to be close to what I want activated:

env_parallel() {
  export parallel_environment='() {
    '"$(echo "shopt -s expand_aliases"; alias;typeset -p | grep -vFf <(readonly);typeset -f)"'
  }'
  `which parallel` "$@"
}

VAR=foo
myfunc() {
  echo $VAR $1
}
alias myf=myfunc
env_parallel parallel_environment';
' myfunc ::: bar # Works (but gives errors)
env_parallel parallel_environment';
' myf ::: bar # Works, but requires the \n after ;

So now I am down to 1 issue:

  • weed out all the variables that cannot be assigned value (e.g BASH_ARGC)

How do I list those?

Community
  • 1
  • 1
Ole Tange
  • 31,768
  • 5
  • 86
  • 104
  • 1
    Aliases cannot be exported. They are a shell-only construct and *cannot* be used by other processes. As for other variables and functions, it's not clear what you want. You going to have to decide when the item is defined whether it should be exported or not, or decide before calling `parallel` which items need to be made available to its environment. – chepner Aug 06 '14 at 12:37
  • Associative arrays cannot be exported directly either, but you can wrap those in a function which _can_ be exported: http://stackoverflow.com/questions/24977782/accessing-associative-arrays-in-gnu-parallel – Ole Tange Aug 06 '14 at 12:40

3 Answers3

2

GNU Parallel 20140822 implements this. To activate it you will need to run this once (e.g. in .bashrc):

env_parallel() {
    export parallel_bash_environment='() {
       '"$(echo "shopt -s expand_aliases 2>/dev/null"; alias;typeset -p | grep -vFf <(readonly; echo GROUPS; echo FUNCNAME; echo DIRSTACK; echo _; echo PIPESTATUS; echo USERNAME) | grep -v BASH_;typeset -f)"'
       }'
     # Run as: env_parallel ...
     `which parallel` "$@"
     unset parallel_bash_environment
}

And call GNU Parallel as:

env_parallel ...

That should put the myth to rest that it is impossible to export aliases: all you need is a little Behändigkeit (Thanks a lot to @rici for the inspiration).

Ole Tange
  • 31,768
  • 5
  • 86
  • 104
1

In principle, it should be possible. But, as usual, there are a lot of details.

First, it is quite possible in bash for a name to be simultaneously a function, a variable (scalar or array) and an alias. Also, the function and the variable can be exported independently.

So there would be a certain ambiguity in env_parallel foo ... in the case that foo has more than one definition. Possibly the best solution would be to detect the situation and report an error, using a syntax like:

env_parallel -a foo -f bar

in order to be more specific, if necessary.

A simpler possibility is to just export the ambiguity, which is what I do below.

So the basic logic to the importer used in env_parallel might be something like this, leaving out lots of error checking and other niceties:

# Helper functions for clarity. In practice, since they are all short,
# I'd probably in-line all of these by hand to reduce name pollution.
get_alias_() { alias "$1" 2>/dev/null; }
get_func_()  { declare -f "$1" 2>/dev/null; }
get_var_()   { [[ -v "$1" ]] && declare -p "$1" | sed '1s/--\?/-g/'; }

make_importer() {
  local name_
  export $1='() {
    '"$(for name_ in "${@:2}"; do
          local got_=()
          get_alias_ "$name_" && got_+=(alias)
          get_func_  "$name_" && got_+=(function)
          get_var_   "$name_" && got_+=(variable)
          if [[ -z $got_ ]]; then
            echo "Not found: $name_" >>/dev/stderr
          elif (( ${#got_[@]} > 1 )); then
            printf >>/dev/stderr \
                   "Ambiguous: %s is%s\n" \
                   $name_ "$(printf " %s" "${got_[@]}")"
          fi
        done)"'
  }'
}

In practice, there's no real point defining the function in the local environment if the only purpose is to transmit it to a remote shell. It would be sufficient to print the export command. And, while it is convenient to package the import into a function, as in Accessing Associative Arrays in GNU Parallel, it's not strictly necessary. It does make it a lot easier to pass the definitions through utilities like Gnu parallel, xargs or find, which is what I typically use this hack for. But depending on how you expect to use the definitions, you might be able to simplify the above by simply prepending the list of definitions to the given command. (If you do that, you won't need to fiddle the global flag with the sed in get_var_.)

Finding out what is in the environment

In case it is useful, here is how to get a list of all aliases, functions and variables:

Functions

declare -F | cut -d' ' -f3

Aliases (Note 1)

alias | awk '/^alias /{print substr($2,1,index($2,"=")-1)}'

Variables (Note 1)

declare -p | awk '$1=="declare"{o=(index($3, "="));print o?substr($3,1,o-1):$3}'

In the awk program, you could check for variable type by looking at $2, which will is usually -- but could be -A for an associative array, -a for an array with integer keys, -i for an integer, -x for exported and -r for readonly. (More than one option may apply; -aix is an "exported" (not implemented) integer array.

Note 1

The alias and declare -p commands produce "reusable" output, which could be eval'ed or piped into another bash, so the values are quoted. Unfortunately, the quoting is just good enough for eval; it's not good enough to avoid confusion. It is possible to define, for example:

x='
declare -a FAKE
'

in which case the output of declare -p will include:

declare -x='
declare -a FAKE
'

Consequently, the lists of aliases and variables need to be treated as "possible names": all names will be included, but it might be that everything included is not a name. Mostly that means being sure to ignore errors:

for a in "${_aliases[@]}"; do
  if
     defn=$(alias $a 2>>/dev/null)
  then
     # do something with $defn
  fi
done
Community
  • 1
  • 1
rici
  • 234,347
  • 28
  • 237
  • 341
  • Closer to a good answer. But I really do not want to name neither the variables/functions/aliases to export. I just want it _all_. That also solves the ambiguity: Simply export everything. I have edited the meta code to make this more explicit. – Ole Tange Aug 07 '14 at 07:13
  • @OleTange It's reasonably to get a complete list of aliases, functions and variables, so you could use that as a basis for the above answer. However, you'd have to trim the list *a lot* for most practical uses. On my system, for example (a fairly recent kubuntu), `declare -f | wc -c` shows almost a quarter of a megabyte, which I think exceeds the possibilities for a command line. – rici Aug 08 '14 at 22:11
0

As is often the case, the solution is to use a function, not an alias. You must first export the function (since parallel and bash are both developed by GNU, parallel knows how to deal with functions as exported by bash).

gi () {
    grep -i "$@"
}
export -f go
parallel gi bar ::: foo
chepner
  • 497,756
  • 71
  • 530
  • 681
  • I have tried to clarify that I am not looking for a solution to the gi-alias, but a general solution to export everything. – Ole Tange Aug 06 '14 at 12:27
  • Could you expand? I couldn't get this to work in the way your answer implies (fixing the typo in the export). When I try referring to a function in parallel I still get "command not found" whether I've exported it or not. – Joshua Goldberg Aug 19 '14 at 01:44