0

I want to redirect the console output to a textfile for further inspection. The task is to extract TIFF-TAGs from a raster file (TIFF) and filter the results. In order to achieve this, I have several tools at hand. Some of them are not python libraries, but command-line tools, such as "identify" of ImageMagick.

My example command-string passed to subprocess.check_call() was:

cmd_str = 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'

Here, in the output of the TIFF-TAGs produced by "identify" all lines which contain information about the TAG number "274" shall be either displayed in the console, or written to a file.

Error-type 1: Displaying in the console

subprocess.check_call(bash_str, shell=True)

subprocess.CalledProcessError: Command 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"' returned non-zero exit status 1.

Error-type 2: Redirecting the output to textfile

subprocess.call(bash_str, stdout=filehandle_dummy, stderr=filehandle_dummy

FileNotFoundError: [Errno 2] No such file or directory: 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"': 'identify -verbose /home/andylu/Desktop/Models_Master/AERSURFACE/Input/Images/Denia_CORINE_CODE_18_reclass_NLCD92_reproj_ADAPTED_Europe_AEA.tif | grep -i "274"'

CODE

These subprocess.check_call() functions were executed by the following convenience function:

def subprocess_stdout_to_console_or_file(bash_str, filehandle=None):
    """Function documentation:\n
    Convenience tool which either prints out directly in the provided shell, i.e. console,
    or redirects the output to a given file.

    NOTE on file redirection: it must not be the filepath, but the FILEHANDLE,
    which can be achieved via the open(filepath, "w")-function, e.g. like so:
    filehandle = open('out.txt', 'w')
    print(filehandle): <_io.TextIOWrapper name='bla_dummy.txt' mode='w' encoding='UTF-8'>
    """

    # Check whether a filehandle has been passed or not
    if filehandle is None:
        # i) If not, just direct the output to the BASH (shell), i.e. the console
        subprocess.check_call(bash_str, shell=True)
    else:
        # ii) Otherwise, write to the provided file via its filehandle
        subprocess.check_call(bash_str, stdout=filehandle)

The code piece where everything takes place is already redirecting the output of print() to a textfile. The aforementioned function is called within the function print_out_all_TIFF_Tags_n_filter_for_desired_TAGs().

As the subprocess-outputs are not redirected automatically along with the print()-outputs, it is necessary to pass the filehandle to the subprocess.check_call(bash_str, stdout=filehandle) via its keyword-argument stdout. Nevertheless, the above-mentioned error would also happen outside this redirection zone of stdout created by contextlib.redirect_stdout().

dummy_filename = "/home/andylu/bla_dummy.txt"  # will be saved temporarily in the user's home folder

# NOTE on scope: redirect sys.stdout for python 3.4x according to the following website_
# https://stackoverflow.com/questions/14197009/how-can-i-redirect-print-output-of-a-function-in-python
with open(dummy_filename, 'w') as f:
    with contextlib.redirect_stdout(f):
        print_out_all_TIFF_Tags_n_filter_for_desired_TAGs(
            TIFF_filepath)

EDIT:

For more security, the piping-process should be split up as mentioned in the following, but this didn't really work out for me. If you have an explanation for why a split-up piping process like

p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

doesn't produce the output-textfile while still exiting successfully, I'd be delighted to learn about the reasons.

OS and Python versions

  • OS:

    NAME="Ubuntu" VERSION="18.04.3 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.3 LTS" VERSION_ID="18.04"

  • Python:

    Python 3.7.6 (default, Jan 8 2020, 19:59:22) [GCC 7.3.0] :: Anaconda, Inc. on linux

Andreas L.
  • 3,239
  • 5
  • 26
  • 65
  • It's treating the entire string, including the spaces and pipe symbol, as a single command name. I think you need to use `shell=True` in both cases. – John Gordon Jan 15 '20 at 21:44
  • 2
    If you want to avoid using `shell=True` for piped commands, you can instead [create separate subprocesses and pipe one into the other](https://stackoverflow.com/a/13332300/4739755) – b_c Jan 15 '20 at 21:52
  • 1
    BTW, your code's variable names are misleading; `shell=True` uses `sh` as the shell, not `bash`, so it should be `sh_string` not `bash_string` -- two different shells with different syntax (even when `/bin/sh` is provided by bash, it turns off some extensions to behave closer to the standard when called by that name). – Charles Duffy Jan 15 '20 at 22:03
  • 1
    and yes, if you want the shell to split your string into multiple arguments, honor pipes, honor redirections, or have any other kind of syntax, you need to use `shell=True` -- or, much better, follow the advice in the [Replacing shell pipeline](https://docs.python.org/3/library/subprocess.html#replacing-shell-pipeline) section of the `subprocess` module documentation, and string together multiple `shell=False` invocations. – Charles Duffy Jan 15 '20 at 22:05

2 Answers2

0

As for the initial error mentioned in the question: The comments answered it with that I needed to put in all calls of subprocess.check_call() the kwarg shell=True if I wanted to pass on a prepared shell-command string like

gdalinfo TIFF_filepath | grep 'Pixel Size =' > path_to_textfile

As a sidenote, I noticed that it doesn't make a difference if I enquote the paths or not. I'm not sure whether it makes a difference using single (') or double (") quotes.

Furthermore, for the sake of security outlined in the comments to my questions, I followed the docs about piping savely avoiding shell and consequently changed from my previous standard approach

subprocess.check_call(shell_str, shell=True)

to the (somewhat cumbersome) piping steps delineated hereafter:

p1 = subprocess.Popen(['gdalinfo', 'TIFF_filepath'], stdout=PIPE)
p2 = subprocess.Popen(['grep', "'Pixel Size =' > 'path_to_textfile'"], stdin=p1.stdout, stdout=PIPE)
p1.stdout.close()  # Allow p1 to receive a SIGPIPE if p2 exits.
output = p2.communicate()[0]

In order to get these sequences of command-strings from the initial entire shell-string, I had to write custom string-manipulation functions and play around with them in order to get the strings (like filepaths) enquoted while avoiding to enquote other functional parameters, flags etc. (like -i, >, ...). This quite complex approach was necessary since shlex.split() function just splitted my shell-command-strings at every whitespace character, which lead to problems when recombining them in the pipes.

Yet in spite of all these apparent improvements, there is no output textfile generated, albeit the process seemingly doesn't produce any errors and finishes "correctly" after the last line of the piping process:

output = p2.communicate()[0]

As a consequence, I'm still forced to use the old and unsecure, but at least well-working approach via the shell:

subprocess.check_call(shell_str, shell=True)

At least it works now employing this former approach, even though I didn't manage to implement the more secure piping procedure where several commands can be glued/piped together.

Andreas L.
  • 3,239
  • 5
  • 26
  • 65
-1

I once ran into a similar issue like this and this fixed it.

cmd_str.split(' ')

My code :

# >>>>>>>>>>>>>>>>>>>>>>> UNZIP THE FILE AND RETURN THE FILE ARGUMENTS <<<<<<<<<<<<<<<<<<<<<<<<<<<<

def unzipFile(zipFile_):
    # INITIALIZE THE UNZIP COMMAND HERE
    cmd = "unzip -o " + zipFile_ + " -d " + outputDir

    Tlog("UNZIPPING FILE " + zipFile_)

    # GET THE PROCESS OUTPUT AND PIPE IT TO VARIABLE
    log = subprocess.Popen(cmd.split(' '), stdout=subprocess.PIPE, stderr=subprocess.PIPE)

    # GET BOTH THE ERROR LOG AND OUTPUT LOG FOR IT
    stdout, stderr = log.communicate()

    # FORMAT THE OUTPUT
    stdout = stdout.decode('utf-8')
    stderr = stderr.decode('utf-8')

    if stderr != "" :
        Tlog("ERROR WHILE UNZIPPING FILE \n\n\t"+stderr+'\n')
        sys.exit(0)

    # INITIALIZE THE TOTAL UNZIPPED ITEMS
    unzipped_items = []

    # DECODE THE STDOUT TO 'UTF-8' FORMAT AND PARSE LINE BY LINE
    for line in stdout.split('\n'):
        # CHECK IF THE LINE CONTAINS KEYWORD 'inflating'
        if Regex.search(r"inflating",line) is not None:

        # FIND ALL THE MATCHED STRING WITH REGEX
        Matched = Regex.findall(r"inflating: "+outputDir+"(.*)",line)[0]

        # SUBSTITUTE THE OUTPUT BY REMOVING BEGIN/END WHITESPACES
        Matched = Regex.sub('^\s+|\s+$','',Matched)

        # APPEND THE OUTPUTS TO LIST
        unzipped_items.append(outputDir+Matched)

    # RETURN THE OUTPUT
    return unzipped_items
Charles Duffy
  • 280,126
  • 43
  • 390
  • 441
High-Octane
  • 1,104
  • 5
  • 19
  • @CharlesDuffy if you don't mind, can you teach me more? If it's not too much to ask.. – High-Octane Jan 15 '20 at 22:15
  • So, I'll admit that I wasn't paying full attention when I was writing the above, and need to walk back some of what I was saying -- since you aren't using `shell=True` here, `$(rm -rf ~)` and `$(/tmp/evilscript)` aren't threats in the case. By contrast, there are still argument injection attacks, because a filename with spaces can be split into multiple arguments to `unzip`; one needs to look into unzip's arguments to see exactly how exploitable that is. – Charles Duffy Jan 15 '20 at 22:21
  • For the cases where `shell=True` is in use and the vulnerabilities I described above are real, a good place to start is googling for "shell injection", which is the larger class of vulnerabilities that they fall under. – Charles Duffy Jan 15 '20 at 22:21
  • Nice, I've done some SQL injection and sanitization for inputs but never thought shell could be part of it. I'll have a look around shell injection, also do you have any GitHub account I can follow since I could bet your work will be no short of magnificent. Let me know. – High-Octane Jan 15 '20 at 22:25
  • 1
    Dunno -- there's a lot of old code I'm a little embarrassed of. BTW, in both shell and SQL cases, I'm staunchly in favor of taking the defensive approach of keeping data and code out-of-band from each other, instead of trying to sanitize data to be safely injectable into code. There's an answer on a deleted question where I give an overview at https://stackoverflow.com/a/56688189/14122 -- it's covering the shell side instead of the SQL side, but for SQL, substitute bind variables instead of environment variables or arguments as the approach to pass data out-of-band from code. – Charles Duffy Jan 15 '20 at 22:30
  • 1
    @CharlesDuffy Thanks. I'll check it out. And who doesn't have embarrassing code, We all had to start somewhere right? – High-Octane Jan 15 '20 at 22:32
  • 1
    ("sanitizing" data is fine when what you're trying to do is make sure it conforms to the real-world range of possible values, especially when that range is well-defined, but when instead of trying to make sure it's legitimately representative of the possible range of real-world values you go trying to detect and eliminate individual attacks, you both end up with data that's "sanitized" in a way specific to the display layer you had in mind when building that sanitization, and end up with mistakes like those described in https://blog.jgc.org/2010/06/your-last-name-contains-invalid.html). – Charles Duffy Jan 15 '20 at 22:35
  • @CharlesDuffy Yep. Just like it's described in that blog post, but I can't put the blames on the devs alone. Organizations don't allow the time that needs to be put it and tested. It's like a race between time and quality. – High-Octane Jan 15 '20 at 22:41
  • Just for the record: instead of `cmd_str.split()`, you should use `shlex.split(cmd_str)`. – bruno desthuilliers Jan 16 '20 at 07:24
  • 1
    @brunodesthuilliers, ...which works right if-and-only-if your `zipFile_` and `outputDir` contain literal quoting sequences. It's not a panacea by any means, so it's much better to use `shell=False` and parse an explicit argv list instead of trying to rely on *any* kind of automated splitting. – Charles Duffy Jan 16 '20 at 17:03