Python script to verify disk space output from Linux

Question

I am beginner in the python, I am writing a python script to verify the utilization of each mount point is above the threshold or not. I am able invoke the shell command and save the output to a variable. But I am not able to use the variable to split the fields, check the utilization is above the threshold and report the fault

/dev/mapper/system-root     20G   18G  1.4G  93% /
udev                       3.9G  248K  3.9G   1% /dev
tmpfs                      3.9G   68K  3.9G   1% /dev/shm
/dev/sda1                  251M   71M  167M  30% /boot
/dev/mapper/system-oracle  128G   43G   79G  36% /opt/app/oracle
/dev/mapper/system-tmp     5.5G  677M  4.5G  13% /tmp
/dev/mapper/system-log     3.0G  140M  2.7G   5% /var/log
/dev/mapper/system-varsog   20G  654M   19G   4% /var/sog
/dev/mapper/system-backup   50G   24G   24G  50% /var/sog/backups

I want to store field 5 and field 6 in an associative array and validate the Field 5 with threshold and report if it is above the threshold value.

I used below script to store the shell command output and now I need to process by splitting its Fields but I am not able to store it in array as it is multidimensional, So should I need to use For Loop to store in different array.

It is very easy to do in shell ,awk and perl but it seems to be very difficult python.

>>> import sys, os, time, threading, subprocess,datetime
>>> diskinfo_raw = subprocess.Popen("df -h", shell=True,stdout=subprocess.PIPE)
>>> output = diskinfo_raw.communicate()[0]
>>> print output

Please help me with an idea or reference please. I have explored option with loadtxt option but I don't want to store the values in the file and again read it.

It might be a good start not to add the `-h` (human-readable) argument to the command to get an easier to parse output. — Klaus D., Jan 17 '16 at 19:39
`os.statvfs` is used in one of the answers here, http://stackoverflow.com/a/31856769/5781248 — J.J. Hakala, Jan 17 '16 at 19:44
There is also a cross-platform library, http://pythonhosted.org/psutil/ and `shutil.disk_usage` in python 3.3 and later versions. — J.J. Hakala, Jan 17 '16 at 19:52

score 2 · Accepted Answer · answered Jan 18 '16 at 06:30

you can try this:

>>> import subprocess
>>> threshold = 10
>>> child = subprocess.Popen(['df', '-h'], stdout=subprocess.PIPE)
>>> output = child.communicate()[0].strip().split("\n")
>>> for x in output[1:]:
...     if int(x.split()[-2][:-1]) >= threshold:
...         print x

This will List all filesystem that disk usage is 10% or more than 10%

score 1 · Answer 2 · answered Jan 17 '16 at 19:51

1

Using df -h for data source:

import re

d = {}
lines = output.split('\n')
next(lines)  # skip headers
for line in lines:
    usage, mount = re.split('\s+', line)[4:]
    d[usage] = mount

answered Jan 17 '16 at 19:51

Martin Konecny

57,827
19
139
159

score 1 · Answer 3 · answered Jan 17 '16 at 19:52

You could do something like

mount_usage = {line.split()[5]: line.split()[4] for line in output.split('\n')}

which will give a dictionary with key being the mount point and value being the usage fraction.

{'/': '93%', '/dev/shm': '1%', '/dev': '1%', '/boot': '30%', '/tmp': '13%', '/var/sog/backups': '50%', '/opt/app/oracle': '36%', '/var/log': '5%', '/var/sog': '4%'}

jsbueno · Answer 4 · 2016-01-17T19:53:49.277

The new "subprocess" model includes a lot of control to getting the output of external commands, but at a price: it became bureaucratic.

For fast scripts, the old way still works:

>>> import os                                                                                                                                                                     
>>> du = os.popen("df -h").readlines()
>>>      

>>> from pprint import pprint
>>> pprint(du)
['Filesystem      Size  Used Avail Use% Mounted on\n',
 'devtmpfs        7,7G     0  7,7G   0% /dev\n',
 'tmpfs           7,8G  164M  7,6G   3% /dev/shm\n',
 'tmpfs           7,8G  1,2M  7,8G   1% /run\n',
 'tmpfs           7,8G     0  7,8G   0% /sys/fs/cgroup\n',
 '/dev/sda6        24G   12G   12G  51% /\n',
 'tmpfs           7,8G   16K  7,8G   1% /tmp\n',
 '/dev/sda5        24G   19G  4,1G  83% /var\n',
 '/dev/sda3       147G   28G  119G  20% /opt\n',
 '/dev/sda2       391G  313G   79G  81% /home\n',
 'tmpfs           1,6G   20K  1,6G   1% /run/user/1000\n']

The new subprocess module also include a couple shortcuts to be able to get the output of a program without going through all the parameters subprocess.Popen needs:

>>> pprint(subprocess.check_output("df -h".split()).split("\n"))
['Filesystem      Size  Used Avail Use% Mounted on',
 'devtmpfs        7,7G     0  7,7G   0% /dev',
 ...

So, as you can see, subprocess has the check_output function, besides Popen, which by default reads all the output from the external process and returns it as a single string.

The problem with your call is that the subprocess model requires different arguments to an external process to be elements of a list (and the program name counts as an argument). So, it requires subprocess.check_output(["df", "-h"]) - which I substituted above by the usage of "split" on the "df -h" command line, as I usually do in my code.

score 0 · Answer 5 · answered Jan 17 '16 at 19:51

0

As a one-liner:

dict((fields[5], fields[4]) for fields in [line.split() for line in output.strip().split("\n")][1:])

Expanded and explained:

usage = dict()  # Dictionaries are Python's associative arrays
for line in output.strip().split("\n")[1:]: # Get the lines with actual data
    fields = line.split()  # Break the line into fields
    usage[fields[5]] = fields[4]  # Map mount point to usage

answered Jan 17 '16 at 19:51

skyler

1,487
1
10
23

Skyler your help is much appreciated :) It is good see more people help to newbies like us – Mullai Arasu Jan 21 '16 at 18:09

Python script to verify disk space output from Linux

5 Answers5