python: read csv, execute command and write results to new vertical column

Question

I am totally new to python, I read python's csv module is great for what Id like to do. Ive spent some time trying several different methods but have not yet been able to even create an array using the fourth (vertical) column.

I have a four column csv file with hundreds of rows. Before I go on I should probably verify python can even accomplish all that Id like to do.

read a csv FILE,
executes COMMAND on fourth (vertical) column of FILE
the COMMAND prints
read each line for HEALTHY (from COMMAND)
write HEALTHY on new fifth column to NEW_FILE with all five columns
loop until first empty row of FILE

example FILE (comma delimited in cell view)

  HOST                    PLATFORM        ARCH               COMMAND
  server1                 win             x86_64             python '/root/server1.py'
  server2                 linux           x86_64             python '/root/server2.py'
  server3                 linux           x86_64             python '/root/server3.py'

example COMMAND

  # python '/root/server1.py'
  --------------------
  Error: Could not open /root/server1.py


  # python '/root/server2.py'
  --------------------
  server2 p1 (NTFS)       output1:100  output:200    HEALTHY:Yes
  --------------------


  # python 'root/server3.py'
  --------------------
  server3 p1 (linux)       output1:100  output:200    HEALTHY:No
  server3 p2 (linux)       output1:100  output:200    HEALTHY:Yes
  server3 p3 (swap)       output1:100  output:200    HEALTHY:No
  --------------------

if multiple lines of HEALTHY and all do not equal Yes, HEALTHY equals "No"

if HEALTHY is not found on any lines, HEALTHY equals "Error Scanning"

This is what I have so far

  #!/usr/bin/python
  #

  import csv
  import subprocess

  # read csv file
  csv_file = open("my_list.csv", "rb")
  my_csv_reader = csv.reader(csv_file, delimiter=",")
  my_data_list = []
  for row in my_csv_reader:
          print row
          my_data_list.append(row)
  csv_file.close()

  # write csv file
  csv_file = open("new_data.csv", "wb")
  my_csv_writer = csv.writer(csv_file, delimiter=",")
  for row in my_data_list:
          my_csv_writer.writerow(row)
  csv_file.close()

  # running commands, getting output
  # run COMMAND column from csv_file, use "python 'my_script.py'" for now
  # my_script.py only for now: print "HEALTHY:Yes"
  p = subprocess.Popen("python '/root/my_script.py'",stdout=subprocess.PIPE,stderr=subprocess.PIPE)
  output, errors = p.communicate()
  print output
  print errors

Executing the above:

  # python '/root/this_script.py'
  ['HOST', 'PLATFORM', 'ARCH', 'COMMAND']
  ['server1', 'win', 'x86_64', "python '/root/server1.py'"]
  ['server2', 'linux', 'x86_64', "python '/root/server2.py'"]
  ['server3', 'linux', 'x86_64', "python '/root/server3.py'"]
  Traceback (most recent call last): 
     File "thisscript.py", line 24, in ? 
       p = subprocess.Popen('python myscript1.py',stdout=subprocess.PIPE,stderr=subprocess.PIPE) 
     File "/usr/lib64/python2.4/subprocess.py", line 550, in __init__ 
       errread, errwrite) 
     File "/usr/lib64/python2.4/subprocess.py", line 993, in _execute_child 
       raise child_exception 
     OSError: [Errno 2] No such file or directory

Bonus:
If I wanted to search the stdout/command output for something also (such as linux, swap, NTFS, etc, --third example command in question above) and append it to row[5], or next after it has already searched for [i]Healthy[/i]... Ive tried starting a new if statement but it appears to only append row[4], or the same row as when it does for [i]Healthy[/i].

I also cant figure out how to to use an OR statement. Where

 if 'Linux' OR 'swap' OR 'LVM' in stdout:  
     writer.writerow(row + ['Linux']) # for multiple lines/partitions.

 elif 'BSD' in stdout:  
     writer.writerow(row + ['BSD'])

 elif 'NTFS' in stdout:
     writer.writerow(row + ['Windows'])

 else:
     writer.writerow(row + ['Error Scanning'])

Last I have changed the COMMAND column to the PATH and modified the command to execute the PATH. Which is working. I'd like to execute a second command to fetch the filesize of PATH. Ive tried a couple methods.

Thank you for your time. I hope this can all be done.

What happens when you try that code? Also, if that's your real input file, it's not comma delimited, because it doesn't have commas. — Thomas K, Apr 12 '12 at 12:17
Please edit the question and insert the output. Don't put that in the comment. — Kien Truong, Apr 12 '12 at 13:29
I feel like "not any healthy == error scanning" is an incorrect assertion. "Error could not open == error scanning", but "not any healthy" means "no error scanning, none found healthy" which is what we'd expect rather than (for example) server 3 had 3 unhealthy scans but instead mis-reported 'error scanning' — hexparrot, Apr 12 '12 at 15:12

Marty · Accepted Answer · 2012-04-12T20:19:36.480

You're not using subprocess.Popen correctly, which is leading to the immediate problem (OSError: [Errno 2] No such file or directory).

In general, the first argument to Popen should be a sequence, not a string unless you also pass the shell=True keyword parameter. If the first argument is a string and shell=False (the default), Popen will attempt to execute the file named for the value of the string. There is no file named "python '/root/my_script.py'" (the whole string), therefore you get an OSError.

So,

p = subprocess.Popen(
    "python '/root/my_script.py'", 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

should probably become something like ...

p = subprocess.Popen(
    ["python", "'/root/my_script.py'"], 
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

or (essentially equivalent)

p = subprocess.Popen(
    "python '/root/my_script.py'".split(), 
     stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

or (see warning)

p = subprocess.Popen(
    "python '/root/my_script.py'", shell=True,
    stdout=subprocess.PIPE, stderr=subprocess.PIPE
)

Update: The answer to your question is Yes. Python can help you accomplish all that you'd like to do. Here's a breakdown of you list.

SPOILER ALERT! Do not read beyond this line if you want to figure things out for yourself.

read a csv FILE

What you've done is fine. Another way ...

with open('my_list.csv', 'rb') as fp:
    my_data_list = [row for row in csv.reader(fp)]

... which introduces some potentially new concepts, the with statement, and list comprehensions. But you don't really need an intermediate list to act upon, you can read and write in the same loop (see below)

executes COMMAND on fourth (vertical) column of FILE
the COMMAND prints
loop until first empty row of FILE

I've assumed you want to print the output, or result, of running the command.

for row in my_data_list:
    command = row[3] #<- 4th column is index 3, 1st is 0
    p = Popen(command.split(), stdout=PIPE, stderr=STDOUT) #<- stderr to stdout
    stdout, empty = p.communicate()
    print stdout

read each line for HEALTHY (from COMMAND)
- if multiple lines of HEALTHY and all do not equal Yes, HEALTHY equals "No"
- if HEALTHY is not found on any lines, HEALTHY equals "Error Scanning"

write HEALTHY on new fifth column to NEW_FILE with all five columns

if 'HEALTHY:No' in stdout:
    writer.writerow(row + ['No'])
elif 'HEATHLY:Yes' in stdout:
    writer.writerow(row + ['Yes'])
else: 
    writer.writerow(row + ['Error Scanning'])

And putting it all together (untested) ...

import csv
from subprocess import Popen, PIPE, STDOUT

with open('my_list.csv', 'rb') as incsv:
    with open('new_data.csv', 'wb') as outcsv:
        reader = csv.reader(incsv)
        writer = csv.writer(outcsv)

        for row in reader:
            p = Popen(row[3].split(), stdout=PIPE, stderr=STDOUT)
            stdout, empty = p.communicate()

            print 'Command: %s\nOutput: %s\n' % (row[3], stdout)

            if 'HEALTHY:No' in stdout:
                writer.writerow(row + ['No'])
            elif 'HEATHLY:Yes' in stdout:
                writer.writerow(row + ['Yes'])
            else: 
                writer.writerow(row + ['Error Scanning'])

Update: fixed poor naming choice of csv reader and writer file objects

Update: Python 2.5 introduced the from __future__ import with_statement directive. For versions of python older than 2.5, the with statement is unavailable. In this case, the common approach is to wrap file operations in a try finally. As in,

import csv
from subprocess import Popen, PIPE, STDOUT

incsv = open('my_list.csv', 'rb')
try:
    reader = csv.reader(incsv)
    outcsv = open('new_data.csv', 'wb')
    try:    
        writer = csv.writer(outcsv)

        for row in reader:
            p = Popen(row[3].split(), stdout=PIPE, stderr=STDOUT)
            stdout, empty = p.communicate()

            print 'Command: %s\nOutput: %s\n' % (row[3], stdout)

            if 'HEALTHY:No' in stdout:
                writer.writerow(row + ['No'])
            elif 'HEATHLY:Yes' in stdout:
                writer.writerow(row + ['Yes'])
            else: 
                writer.writerow(row + ['Error Scanning'])
    finally:
        outcsv.close()
finally:
    incsv.close()

HTH!

Nice, that did resolve the last error. Now prints "HEALTHY:Yes" — Tommy, Apr 12 '12 at 16:31
I love the SPOILER ALERT :) We need more of those on SO, people should enjoy solving the puzzles... — max, Apr 12 '12 at 17:03
Nice job. When trying to use your spoiler: File "my_list.py", line 4 with open('my_list.csv', 'rb') as incsv: ^ SyntaxError: invalid syntax — Tommy, Apr 12 '12 at 19:08
Are you using python 2.4 or 2.5? I'll update the "spoiler" :) — Marty, Apr 12 '12 at 19:15
It boils down to this, if you're using 2.5 add `from __future__ import with_statement` as the first import statement in your script. If you're using an older version of python, see [this previous stackoverflow question](http://stackoverflow.com/questions/3770348/how-to-safely-open-close-files-in-python-2-4) — Marty, Apr 12 '12 at 19:20
My system is running 2.4.3, its a production server and will be unable to update. Thanks for all your help Marty — Tommy, Apr 12 '12 at 20:06
@user1328963 you're very welcome. I've updated my answer with an alternative to the with statement. — Marty, Apr 12 '12 at 20:30
hmm. well I have a similar error as I did at first(last code section of question): ... line 16, in ? stderr=STDOUT) ... — Tommy, Apr 12 '12 at 20:56
@Marty I changed that line to `p = Popen(row[3].split(), shell=True, stdin=PIPE, stdout=PIPE, stderr=PIPE)` and it now functions with exception that (row[3], stdout) isnt reading text beyond the first single quote foreach row. removing them works, except it doesn't know how to execute a 2+ phrase command. changing command/column entry in csv does work with: uname. `Command: uname Output: Linux` If it helps, changing command/column in csv from example to `/root/server1.py` and defining first phrase `python` (or custom) wont be an issue — Tommy, Apr 12 '12 at 23:27
Hmm, I'm not entirely sure I follow the issue. Perhaps the csv module isn't reading in the command column properly? It might be helpful if you provide a more realistic sample of the input csv. Otherwise, try taking out the `'` (ticks) in the input csv -- they shouldn't be necessary, and might be the culprit. Can you also provide an example of a `2+ phrase command`, I'm not sure I understand. Thanks! — Marty, Apr 13 '12 at 05:30
@Marty Removing the single quotes does help. Though I am convinced it also wont execute any command beyond the first word or space. I changed the csv to reflect `command => uname -r` and it correctly displays the command, but the output as if it were only `command => uname`. I also tried to adjust `print 'Command: %s\nOutput: %s\n' % (row[3], stdout)` to `print 'Command: %s\nOutput: %s\n' % ("python " + row[3], stdout)` or similar, which seems to print but ignored in execution. A few lines of my CSV appended to question. Thanks for coming back Marty, you rock! :) — Tommy, Apr 13 '12 at 11:29
I removed `.split()` from Popen and it resolved the problem where it only executed the first word from row[3]. Seems to be working nicely. Thanks, over and over again :) — Tommy, Apr 13 '12 at 13:27
@Tommy, you're very welcome. It seems that the ticks are problematic. Either of these should work (I suspect you now have something similar to the second): `Popen("python /root/server1.py".split(), stdout...)` (no ticks, with split) or `Popen("python '/root/server1.py'", shell=True, stdout...)` (shell=True, no split). Either way, I'm glad it worked out! — Marty, Apr 13 '12 at 15:57
@Marty You don't have to answer, you've been so helpful and done alot for me already. Ive added some sort of unrelated issues into the question as a **Bonus**. Im starting to get the hang of python, a long way to go but Ive managed to do some stuff, maybe not exactly what I want, but stuff :) — Tommy, Apr 13 '12 at 17:25
@Tommy, I believe the best course of action would be to submit your **Bonus** section as another question, or questions, here on stackoverflow. Try and distill each element down to the fundamental issue you're stuck on, and include a link referencing our work here, as background. — Marty, Apr 13 '12 at 18:12

score 1 · Answer 2 · answered Apr 13 '12 at 17:53

In your "bonus" section:

If you want to search for multiple things, the simplest and most straightforward way is to search for each separately and then connect with or:

if 'Linux' in stdout or 'swap' in stdout or 'LVM' in stdout:
    writer.writerow(row + ['Linux'])

If you find this inelegant or need to search for more things, you can use the any function and a generator expression:

if any(x in stdout for x in ('Linux', 'swap', 'LVM')):
    writer.writerow(row + ['Linux'])

Finally, if this is still too inelegant, or if stdout becomes much larger and you don't want to search it multiple times, you can use regular expressions via the re module.

python: read csv, execute command and write results to new vertical column

2 Answers2