8

Question, topic of discussion

I am very interested in generation of command line shell scripting source code from code written in a more robustness-promoting, well-performant and platform-independent compiled language (OCaml, for instance). Basically, you would program in a compiled language to perform any interactions with the OS that you want (I would propose: the more complex interactions or ones that are not easy to do in a platform-independent way), and finally you would compile it to a native binary executable (preferably), which would generate a shell script that effects in the shell what you programmed in the compiled language. [ADDED]: With 'effects', I mean to set the environment variables and shell options, execute certain non-standard commands (the standard scripting 'glue' would be handled by the compiled executable and would be kept out of the generated shell script) and such.

I have not found any such solution so far. It seems to be relatively easy* to realize compared to other possibilities of today, like compiling OCaml to JavaScript.

  • Are there already (public) implementations of what I describe?
  • What are other possibilities that are (very) similar to what I describe, and in what ways do they differ from that? (Language-to-language compilation (from compiled to sh) comes to mind, although that seems unnecessarily hard to realize.)

What I do not mean

  1. An alternative shell (like Scsh). The systems you administer may not always allow shells to be chosen by the user or by one administrator, and I also hope it to be a system administration solution exclusively for others (customers, colleagues and others) as well, people who cannot be expected to accept a different shell.
  2. An alternative interpreter, for the purpose that non-interactive shell scripts normally serve (like ocamlscript). Personally, I do not have a problem in avoiding shell scripting for this purpose. I do so because shell scripts are generally harder to maintain (for example, sensitive to certain characters and manipulation of mutable things like 'commands') and harder to craft to the same level of functionality that popular general-purpose programming languages could offer (for example, compare Bash to Python in this regard). However, there are cases where a native shell script is needed, for instance a shell profile file that is sourced by a shell when it is launched.

Background

Practical applications

Some of you may doubt the practical usefulness of what I describe. One practical application of this is to define a shell profile based on various conditions (for example the system platform/OS on which the profile is being sourced, what follows from the security policy, the concrete shell, login/non-login type of the shell, interactive/non-interactive type of shell). The advantage over a (well-crafted) generic shell profile as a shell script would be improvement in performance (native machine code that may generate a compressed/optimized source code instead of human-written script interpretation), robustness (type checking, exception handling, compile time verification of functionality, cryptographic signing of the resultant binary executable), capabilities (less or no reliance on userland CLI tools, no limitation to use minimum functionality covered by the CLI tools of all possible platforms) and cross-platform functionality (in practice standards like the Single UNIX Specification only mean so much, and many shell profile concepts carry over to Non-Unix platforms like Windows, with its PowerShell, too).

Implementation details, side issues

  1. The programmer should be able to control the degree of genericity of the generated shell script. For example, it could be that the binary executable is run every time and puts out the shell profile code that is appropriate, or it could simply generate a fixed shell script file tailored to the circumstances of one run. In the latter case, the listed advantages - in particular those for robustness (e.g. exception handling and reliance on userland tools) are far more limited. [ADDED]
  2. Whether the resultant shell script would be in some form of universal shell script (like GNU autoconf generates) or shell-native script adapted (dynamically or not) to a specific shell is not a primary question to me.
  3. easy*: It seems to me that this can be realized by basically having available functions in a library for the basic shell builtins. Such a function would simply convert itself plus the passed arguments to a semantically appropriate and syntactically correct shell script statement (as a string).

Thank you for any further thoughts, and especially for concrete suggestions!

Sander
  • 323
  • 3
  • 9
  • Several of your goals seem fairly vacuous. I don't think anyone is troubled if generating a dotfile takes 0.5s instead of 0.05s, for example - it's a one-time job. On the other hand, most profiles contain just a few simple commands to set up the shell's environment, and get executed hundreds or thousands of times each day. Making this slower by rewriting it in a portable format seems misdirected. – tripleee Dec 29 '12 at 13:32
  • Having said that, you might want to look at The Dotfile Generator http://www.blackie.dk/dotfile/ ... It's in TCL so hardly strongly type-checked or robust, but again, for this particular application, it's probably a better fit than OCaml. – tripleee Dec 29 '12 at 13:33
  • 5
    Nobody close this question. I want to answer it. – Gabriella Gonzalez Dec 29 '12 at 14:12
  • 5
    I don't even understand what is the purpose of closing a question. Why would you want to deny people from answering a question? – Gabriella Gonzalez Dec 29 '12 at 19:23
  • 4
    None of the people who voted to close even use Haskell. Big surprise! – Gabriella Gonzalez Dec 29 '12 at 20:09
  • 1
    @triplee: your speak of several vacuous goals, yet you only discuss one. I believe The Dotfile generator creates configuration files, not shell scripts. My global shell profile is now 860 lines long. Those who use a few lines could probably easily code it in a platform-independent and somewhat more robust way. Analogy: 'A sports car is good for racing'. you: 'Buying a sports car is too expensive for a trip to the mall...' TCL is not statically typed and everything is a string. It would not rewrite the code in a portable format, a fast binary executable would emit a proper shell script. Slower? – Sander Dec 30 '12 at 11:21
  • TLD can generate a basic `.bashrc` for you, although it's hardly very versatile or impressive. Anyway, an monolithic 860-line `.bashrc` sounds curious and somewhat alarming. – tripleee Dec 30 '12 at 12:39

1 Answers1

13

There are no Haskell libraries for this, but you can implement this using abstract syntax trees. I'll build up a simple toy example that builds an abstract language-independent syntax tree and then applies a back-end that converts the tree into the equivalent Bash script.

I will use two tricks for modelling syntax trees in Haskell:

  • Model typed Bash expressions using a GADT
  • Implement a DSL using free monads

The GADT trick is rather simple, and I use several language extensions to sweeten the syntax:

{-# LANGUAGE GADTs
           , FlexibleInstances
           , RebindableSyntax
           , OverloadedStrings #-}

import Data.String
import Prelude hiding ((++))

type UniqueID = Integer

newtype VStr = VStr UniqueID
newtype VInt = VInt UniqueID

data Expr a where
    StrL   :: String  -> Expr String  -- String  literal
    IntL   :: Integer -> Expr Integer -- Integer literal
    StrV   :: VStr    -> Expr String  -- String  variable
    IntV   :: VInt    -> Expr Integer -- Integer variable
    Plus   :: Expr Integer -> Expr Integer -> Expr Integer
    Concat :: Expr String  -> Expr String  -> Expr String
    Shown  :: Expr Integer -> Expr String

instance Num (Expr Integer) where
    fromInteger = IntL
    (+)         = Plus
    (*)    = undefined
    abs    = undefined
    signum = undefined

instance IsString (Expr String) where
    fromString = StrL

(++) :: Expr String -> Expr String -> Expr String
(++) = Concat

This lets us build typed Bash expression in our DSL. I only implemented a few primitive operations, but you could easily imagine how you could extend it with others.

If we didn't use any language extensions, we might write expressions like:

Concat (StrL "Test") (Shown (Plus (IntL 4) (IntL 5))) :: Expr String

This is okay, but not very sexy. The above code uses RebindableSyntax to override numeric literals so that you can replace (IntL n) with just n:

Concat (StrL "Test") (Shown (Plus 4 5)) :: Expr String

Similarly, I have Expr Integer implement Num, so that you can add numeric literals using +:

Concat (StrL "Test") (Shown (4 + 5)) :: Expr String

Similarly, I use OverloadedStrings so that you can replace all occurrences of (StrL str) with just str:

Concat "Test" (Shown (4 + 5)) :: Expr String

I also override the Prelude (++) operator so that we can concatenate expressions as if they were Haskell strings:

"Test" ++ Shown (4 + 5) :: Expr String

Other than the Shown cast from integers to strings, it looks just like native Haskell code. Neat!

Now we need a way to create a user-friendly DSL, preferably with Monad syntactic sugar. This is where free monads come in.

A free monads take a functor representing a single step in a syntax tree and creates a syntax tree from it. As a bonus, it is always a monad for any functor, so you can assemble these syntax trees using do notation.

To demonstrate it, I'll add some more code to the previous code segment:

-- This is in addition to the previous code
{-# LANGUAGE DeriveFunctor #-}

import Control.Monad.Free

data ScriptF next
    = NewInt (Expr Integer) (VInt -> next)
    | NewStr (Expr String ) (VStr -> next)
    | SetStr VStr (Expr String ) next
    | SetInt VInt (Expr Integer) next
    | Echo (Expr String) next
    | Exit (Expr Integer)
  deriving (Functor)

type Script = Free ScriptF

newInt :: Expr Integer -> Script VInt
newInt n = liftF $ NewInt n id

newStr :: Expr String -> Script VStr
newStr str = liftF $ NewStr str id

setStr :: VStr -> Expr String -> Script ()
setStr v expr = liftF $ SetStr v expr ()

setInt :: VInt -> Expr Integer -> Script ()
setInt v expr = liftF $ SetInt v expr ()

echo :: Expr String -> Script ()
echo expr = liftF $ Echo expr ()

exit :: Expr Integer -> Script r
exit expr = liftF $ Exit expr

The ScriptF functor represents a single step in our DSL. Free essentially creates a list of ScriptF steps and defines a monad where we can assemble lists of these steps. You can think of the liftF function as taking a single step and creating a list with one action.

We can then use do notation to assemble these steps, where do notation concatenates these lists of actions:

script :: Script r
script = do
    hello <- newStr "Hello, "
    world <- newStr "World!"
    setStr hello (StrV hello ++ StrV world)
    echo ("hello: " ++ StrV hello)
    echo ("world: " ++ StrV world)
    x <- newInt 4
    y <- newInt 5
    exit (IntV x + IntV y)

This shows how we assemble the primitive steps we just defined. This has all the nice properties of monads, including support for monadic combinators, like forM_:

import Control.Monad

script2 :: Script ()
script2 = forM_ [1..5] $ \i -> do
    x <- newInt (IntL i)
    setInt x (IntV x + 5)
    echo (Shown (IntV x))

Notice how our Script monad enforces type safety even though our target language might be untyped. You can't accidentally use a String literal where it expects an Integer or vice versa. You must explicitly convert between them using type-safe conversions like Shown.

Also note that the Script monad swallows any commands after the exit statement. They are ignored before they even reach the interpreter. Of course, you can change this behavior by rewriting the Exit constructor to accept a subsequent next step.

These abstract syntax trees are pure, meaning that we can inspect and interpret them purely. We can define several backends, such as a Bash backend that converts our Script monad to the equivalent Bash script:

bashExpr :: Expr a -> String
bashExpr expr = case expr of
    StrL str           -> str
    IntL int           -> show int
    StrV (VStr nID)    -> "${S" <> show nID <> "}"
    IntV (VInt nID)    -> "${I" <> show nID <> "}"
    Plus   expr1 expr2 ->
        concat ["$((", bashExpr expr1, "+", bashExpr expr2, "))"]
    Concat expr1 expr2 -> bashExpr expr1 <> bashExpr expr2
    Shown  expr'       -> bashExpr expr'

bashBackend :: Script r -> String
bashBackend script = go 0 0 script where
    go nStrs nInts script =
        case script of
            Free f -> case f of
                NewInt e k ->
                    "I" <> show nInts <> "=" <> bashExpr e <> "\n" <>
                        go nStrs (nInts + 1) (k (VInt nInts))
                NewStr e k ->
                    "S" <> show nStrs <> "=" <> bashExpr e <> "\n" <>
                        go (nStrs + 1) nInts (k (VStr nStrs))
                SetStr (VStr nID) e script' ->
                    "S" <> show nID <> "=" <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                SetInt (VInt nID) e script' ->
                    "I" <> show nID <> "=" <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                Echo e script' ->
                    "echo " <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                Exit e ->
                    "exit " <> bashExpr e <> "\n"
            Pure _ -> ""

I defined two interpreters: one for the expression syntax tree and one for the monadic DSL syntax tree. These two interpreters compile any language-independent program into the equivalent Bash program, represented as a String. Of course, the choice of representation is entirely up to you.

This interpreter automatically creates fresh unique variables each time our Script monad requests a new variable.

Let's try out this interpreter and see if it works:

>>> putStr $ bashBackend script
S0=Hello, 
S1=World!
S0=${S0}${S1}
echo hello: ${S0}
echo world: ${S1}
I0=4
I1=5
exit $((${I0}+${I1}))

It generates a bash script that executes the equivalent language-indepent program. Similarly, it translates script2 just fine, too:

>>> putStr $ bashBackend script2
I0=1
I0=$((${I0}+5))
echo ${I0}
I1=2
I1=$((${I1}+5))
echo ${I1}
I2=3
I2=$((${I2}+5))
echo ${I2}
I3=4
I3=$((${I3}+5))
echo ${I3}
I4=5
I4=$((${I4}+5))
echo ${I4}

So this is obviously not comprehensive, but hopefully that gives you some ideas for how you would implement this idiomatically in Haskell. If you want to learn more about the use of free monads, I recommend you read:

I've also attached the complete code here:

{-# LANGUAGE GADTs
           , FlexibleInstances
           , RebindableSyntax
           , DeriveFunctor
           , OverloadedStrings #-}

import Control.Monad.Free
import Control.Monad
import Data.Monoid
import Data.String
import Prelude hiding ((++))

type UniqueID = Integer

newtype VStr = VStr UniqueID
newtype VInt = VInt UniqueID

data Expr a where
    StrL   :: String  -> Expr String  -- String  literal
    IntL   :: Integer -> Expr Integer -- Integer literal
    StrV   :: VStr    -> Expr String  -- String  variable
    IntV   :: VInt    -> Expr Integer -- Integer variable
    Plus   :: Expr Integer -> Expr Integer -> Expr Integer
    Concat :: Expr String  -> Expr String  -> Expr String
    Shown  :: Expr Integer -> Expr String

instance Num (Expr Integer) where
    fromInteger = IntL
    (+)         = Plus
    (*)    = undefined
    abs    = undefined
    signum = undefined

instance IsString (Expr String) where
    fromString = StrL

(++) :: Expr String -> Expr String -> Expr String
(++) = Concat

data ScriptF next
    = NewInt (Expr Integer) (VInt -> next)
    | NewStr (Expr String ) (VStr -> next)
    | SetStr VStr (Expr String ) next
    | SetInt VInt (Expr Integer) next
    | Echo (Expr String) next
    | Exit (Expr Integer)
  deriving (Functor)

type Script = Free ScriptF

newInt :: Expr Integer -> Script VInt
newInt n = liftF $ NewInt n id

newStr :: Expr String -> Script VStr
newStr str = liftF $ NewStr str id

setStr :: VStr -> Expr String -> Script ()
setStr v expr = liftF $ SetStr v expr ()

setInt :: VInt -> Expr Integer -> Script ()
setInt v expr = liftF $ SetInt v expr ()

echo :: Expr String -> Script ()
echo expr = liftF $ Echo expr ()

exit :: Expr Integer -> Script r
exit expr = liftF $ Exit expr

script :: Script r
script = do
    hello <- newStr "Hello, "
    world <- newStr "World!"
    setStr hello (StrV hello ++ StrV world)
    echo ("hello: " ++ StrV hello)
    echo ("world: " ++ StrV world)
    x <- newInt 4
    y <- newInt 5
    exit (IntV x + IntV y)

script2 :: Script ()
script2 = forM_ [1..5] $ \i -> do
    x <- newInt (IntL i)
    setInt x (IntV x + 5)
    echo (Shown (IntV x))

bashExpr :: Expr a -> String
bashExpr expr = case expr of
    StrL str           -> str
    IntL int           -> show int
    StrV (VStr nID)    -> "${S" <> show nID <> "}"
    IntV (VInt nID)    -> "${I" <> show nID <> "}"
    Plus   expr1 expr2 ->
        concat ["$((", bashExpr expr1, "+", bashExpr expr2, "))"]
    Concat expr1 expr2 -> bashExpr expr1 <> bashExpr expr2
    Shown  expr'       -> bashExpr expr'

bashBackend :: Script r -> String
bashBackend script = go 0 0 script where
    go nStrs nInts script =
        case script of
            Free f -> case f of
                NewInt e k ->
                    "I" <> show nInts <> "=" <> bashExpr e <> "\n" <> 
                        go nStrs (nInts + 1) (k (VInt nInts))
                NewStr e k ->
                    "S" <> show nStrs <> "=" <> bashExpr e <> "\n" <>
                        go (nStrs + 1) nInts (k (VStr nStrs))
                SetStr (VStr nID) e script' ->
                    "S" <> show nID <> "=" <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                SetInt (VInt nID) e script' ->
                    "I" <> show nID <> "=" <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                Echo e script' ->
                    "echo " <> bashExpr e <> "\n" <>
                        go nStrs nInts script'
                Exit e ->
                    "exit " <> bashExpr e <> "\n"
            Pure _ -> ""
Vitus
  • 11,822
  • 7
  • 37
  • 64
Gabriella Gonzalez
  • 34,863
  • 3
  • 77
  • 135
  • Thank you, very interesting and concrete! My knowledge about Haskell is very limited currenyl, but I see how that language is ideal for this. – Sander Dec 31 '12 at 13:37
  • As I understand your post, you would use a DSL and you use free monads to make this DSL easier and better to define. So suppose we would have a more complete DSL. To code shell scripts in the custom DSL would yield the robustness benefits, at the cost of having master both the robust language (e.g. Haskell) and the DSL. What I was considering is actually not creating a DSL to cover all shell programming but only those commands with side effects (echo, etc.) particularly the practically important ones (exporting environment variables and setting shell options). These all have a simple syntax. – Sander Dec 31 '12 at 13:49
  • In fact what I had in mind was to cover *all* serious code in the robust language, and to limit 'output' in the form of shell code to some end-result lines in which environment variables and shell options are set, and such. Even shell functions and aliases could be reworked as compiled executable files after you have programmed them in such a framework in Haskell. So only a minimal amount of shell code would still be required and the DSL or similar solution could stay very straightforward. – Sander Dec 31 '12 at 13:59
  • Ultimately, you would have a shell profile file like .profile (and some symlinks to it with the common names of all kinds of profile files like .bashrc, .cshrc, bash_profile, etc.) containing no more than `. $(./robust_profile $$);` where robust_profile would be an executable in the same directory (home directory) that would inspect the current state and the properties of the shell process ($$) and use that to come up with the proper shell options, environment, functions, etc. Seems very feasible, and I would love the power and reliability of such a solution. – Sander Dec 31 '12 at 14:24
  • @SNM_NL I understand now. You actually want to do all the programmming in Haskell, and then just emit a uniform interface to non-portable shell commands that the backend executes by calling the equivalent shell command. Then you just compile the whole program into an executable and run that executable in your profile shell script. That's MUCH easier and you still do it using free monads. Since this question is closed just e-mail me at Gabriel439 gmail.com with a few example non-portable shell commands and I will email you back with write the equivalent free monad code and backends. – Gabriella Gonzalez Jan 01 '13 at 18:27
  • This answer is unbelievable! I would upvote it 100 times if I could! – Paul Feb 16 '13 at 09:19
  • @Paul Thanks! Ironically, it wasn't actually what he was looking for, but your question reminded me of this. – Gabriella Gonzalez Feb 16 '13 at 11:59
  • I just found a very useful implementation (in OCaml) of the kind of software I was looking for with my question: https://github.com/BYVoid/Batsh – Sander Mar 24 '16 at 11:34