0

I noticed that the 64-Bit Command Line Anaconda Installer for macOS is a large 400+ MB Bash/Bourne shell script.

When I tried to read it, I noticed that its first 555 lines are readable text, but the following part of the script is in the binary format, probably encrypted.

See https://www.anaconda.com/products/individual and https://repo.anaconda.com/archive/Anaconda3-2021.05-MacOSX-x86_64.sh.

I noticed similar scripts, such as Tcl scripts associated with electronic design automation software.

How do we transform source code files, such as scripts (shell scripts, or Tcl/Perl/Python/Ruby scripts, or C++/Java/Scala/Haskell/Lisp source code), into partially readable text and binary otherwise?

Can we just merge two parts, one in ASCII/text format, and the other in binary format?

That said, how do we obtain the binary executable for scripts, such as shell scripts or Tcl/Perl/Python/Ruby scripts?

I know how to obtain binary executables for C++ and C, and FORTRAN.

Other than using a platform-specific (in terms of operating system and hardware configuration, such as processor type or instruction set architecture) compiler to compile scripts into binary executables, and concatenating them text files with the binary files, how else can I do it?

Are there software applications that do this? What techniques, in terms of algorithms, do these software applications use?

Thank you so much, and have a good day!

Giovanni
  • 101
  • 3
  • 13

2 Answers2

1

To answer one of your questions, here is a helpful guide to embedding a binary file into a shell/bash script:

https://www.xmodulo.com/embed-binary-file-bash-script.html

(code example below is taken from this link)

The body of the shell script needs to be commands to isolate & execute the binary data contained within.

The trick is to place an "exit" command at the end of the written script followed by a unique delimiter line (which is "__PAYLOAD_BEGINS__" in the below example):

#!/bin/bash

# line number where payload starts
PAYLOAD_LINE=$(awk '/^__PAYLOAD_BEGINS__/ { print NR + 1; exit 0; }' $0)

# directory where a binary executable is to be saved
WORK_DIR=/tmp
# name of an embedded binary executable
EXE_NAME=dummy_executable

# extract the embedded binary executable
tail -n +${PAYLOAD_LINE} $0 | base64 -d | cat > ${WORK_DIR}/${EXE_NAME}
chmod +x ${WORK_DIR}/${EXE_NAME}

# run the executable as needed
${WORK_DIR}/${EXE_NAME}

exit 0
__PAYLOAD_BEGINS__

Then you can append the script with base64-encoded binary data:

$ base64 dummy_executable >> script.sh

You could also append the binary data without base64-encoding, but this is not recommended as you will not be able to edit the script again after doing so.

some coder guy
  • 285
  • 3
  • 10
1

Shell Scripts with Payload

In Anaconda3....sh there is nothing encrypted. There are multiple binary files appended to the end of the script. Creating such a file yourself is trivial. Open a terminal and run

cat script.sh file1 file2 ... > script-with-payload.sh

The only tricky part is to write a script.sh that can handle the payload.

  • For starters, write exit at the end of your script.sh, so that the shell does not try to interpret the binary part as shell commands when executing script-with-payload.sh.
  • Then, somewhere inside script.sh use something like tail, sed, or dd to extract the binary data at the end of the script.

For a concrete example see Combine a shell script and a zip file into a single executable for deployment or Self-extracting script in sh shell or How do Linux binary installers (.bin, .sh) work?.

In Anaconda3....sh they use dd commands to extract a Mach-O 64-bit x86_64 executable (14'807'207 bytes) and a tar.bz2 file (438'910'836 bytes). Comments in the script point out that the shell script was generated by shar.py.

Remaining Questions

How do we transform source code files, such as [...] C++/Java/Scala/Haskell/Lisp [...] into partially readable text and binary otherwise?

C++, Java, and so on have to be compiled to be run, so distributing the uncompiled text file with an embedded payload doesn't really make sense.

how do we obtain the binary executable for scripts, such as shell scripts or Tcl/Perl/Python/Ruby scripts?

This is an entirely different question and has to be answered for each scripting language independently. The general answer is, you don't. Scripting languages are not meant to be compiled.

Are there software applications that do this?

Yes, by searching for bash payload or bash selfextracting you can find quite a few tools. However, most of them seem rather hacked together. The most officially are is GNU sharutils and makeself.

What techniques, in terms of algorithms, do these software applications use?

The principle is always the same: concat a script and some payload, then let the script extract the payload from itself. There is no "algorithm" involved.

Socowi
  • 25,550
  • 3
  • 32
  • 54