regex help on unix df

Question

I need some help tweaking my code to look for another attribute in this unix df output:

Ex.

Filesystem     Size    Used   Avail Capacity  Mounted on
/dev/ad4s1e     61G     46G    9.7G    83%    /home

So far I can extract capacity, but now I want to add Avail.

Here is my perl line that grabs capacity. How do I get "Avail"?? Thanks!

my @df = qx (df -k /tmp);
my $cap;
foreach my $df (@df)
        {
         ($cap) =($df =~ m!(\d+)\%!);
        };

print "$cap\n";

Why are you using a regex? You know which column you want, so just grab the column by its index. — William Pursell, Jun 14 '11 at 21:42
@William: Haven't you heard? Regular expressions are the answer to _everything_... — Lightness Races in Orbit, Jun 14 '11 at 21:47
You may want to use `df -P` (if you are using GNU df) for POSIX portability. — Seth Robertson, Jun 14 '11 at 21:48
[There are those who, when faced with a problem, think "I'll use a regular expression". Now they have two problems.](http://fishbowl.pastiche.org/2003/08/18/beware_regular_expressions/). — Jonathan Leffler, Jun 14 '11 at 21:50
[Only vaguely related, but I can't help it...](http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454) — Lightness Races in Orbit, Jun 14 '11 at 21:51
A regex is more robust than hard coding column offsets. Any slight change to the formatting will change the column offsets, different flavors and versions of df and different formatting flags, whereas a decent regex will survive most formatting changes. — Schwern, Jun 14 '11 at 21:53

score 11 · Answer 1 · answered Jun 14 '11 at 23:10

11

The easy perl way:

perl -MFilesys::Df -e 'print df("/tmp")->{bavail}, "\n"'

answered Jun 14 '11 at 23:10

clt60

62,119
17
107
194

3

+1 This is the only decent answer. This is a system call (`statvfs` etc.), so use a library to directly access the structured data, not a regex! Parsing `df` text output is every bit as moronic as parsing `ls` text output because it breaks. GNU coreutils `df` likes to put the filesystem name on its own line when it becomes too long, none of the regex parsing solution took that into consideration. System calls don't have that problem in the first place. – daxim Jun 15 '11 at 08:16

Schwern · Accepted Answer · 2011-06-15T06:36:49.557

This has the merit of producing a nice data structure for you to query all the info about each filesystem.

# column headers to be used as hash keys
my @headers = qw(name size used free capacity mount);

my @df = `df -k`;
shift @df;  # get rid of the header

my %devices;
for my $line (@df) {
    my %info;
    @info{@headers} = split /\s+/, $line;  # note the hash slice
    $info{capacity} = _percentage_to_decimal($info{capacity});
    $devices{ $info{name} } = \%info;
}

# Change 12.3% to .123
sub _percentage_to_decimal {
    my $percentage = shift;
    $percentage =~ s{%}{};
    return $percentage / 100;
}

Now the information for each device is in a hash of hashes.

# Show how much space is free in device /dev/ad4s1e
print $devices{"/dev/ad4s1e"}{free};

This isn't the simplest way to do it, but it is the most generally useful way to work with the df information putting it all in one nice data structure that you can pass around as needed. This is better than slicing it all up into individual variables and its a technique you should get used to.

UPDATE: To get all the devices which have >60% capacity, you'd iterate through all the values in the hash and select those with a capacity greater than 60%. Except capacity is stored as a string like "88%" and that's not useful for comparison. We could strip out the % here, but then we'd be doing that everywhere we want to use it. Its better to normalize your data up front, that makes it easier to work with. Storing formatted data is a red flag. So I've modified the code above which reads from df to change the capacity from 88% to .88.

Now its easier to work with.

for my $info (values %devices) {
    # Skip to the next device if its capacity is not over 60%.
    next unless $info->{capacity} > .60;

    # Print some info about each device
    printf "%s is at %d%% with %dK remaining.\n",
        $info->{name}, $info->{capacity}*100, $info->{free};
}

I chose to use printf here rather than interpolation because it makes it a bit easier to see what the string will look like when output.

@Schwem - i like this. Being a newbie, I'm open to learning some other techniques. Can you elaborate on the use of the quoted device name, is it my index to the value free? Is that how i interpret that? Thanks! — jdamae, Jun 14 '11 at 22:19
I want to set a threshold, using an if statement. How would I do that for example, if I want capacity > 60% for example? — jdamae, Jun 14 '11 at 22:42
@jdamae Yes, `%devices` is keyed by the device name (what df calls the filesystem). — Schwern, Jun 15 '11 at 06:23
@jdamae I've added an example of how you'd search through the hash and make use of it. — Schwern, Jun 15 '11 at 06:37

score 2 · Answer 3 · answered Jun 14 '11 at 21:42

2

Have you tried simply splitting on whitespace and taking the 4th and 5th columns?

my @cols = (split(/\s+/, $_));
my $avail = $cols[3];
my $cap   = $cols[4];

(Fails if you have spaces in your device names of course...)

answered Jun 14 '11 at 21:42

Mat

202,337
40
393
406

TLP · Answer 4 · 2011-06-14T22:00:11.497

2

Us split instead, and get the args from the resulting array. E.g.

my @values = split /\s+/, $df;
my $avail = $values[3];

Or:

($filesystem, $size, $used, $avail, $cap, $mount) = split /\s/, $df;

edited Jun 14 '11 at 22:00

answered Jun 14 '11 at 21:44

TLP

66,756
10
92
149

1

you'll need a `+` in that split, `df` doesn't always issue a single whitespace char between its colums – Mat Jun 14 '11 at 21:50
@Mat `split` does that automatically. – TLP Jun 14 '11 at 21:58
@TLP, Mat is right, you are wrong. `perl -E'say((split /\s/, "a\x20\x20\x20b c d")[3])'` prints `b` instead of `d`. – ikegami Jun 14 '11 at 22:01
1

`split ' '` removes leading whitespace and splits on whitespace. `split /\s+/` splits on whitespace. `split /\s/` splits on individual whitespace characters. – ikegami Jun 14 '11 at 22:02
1

@Mat It's `' '` that does it automatically. – TLP Jun 14 '11 at 22:02

score 1 · Answer 5 · answered Jun 14 '11 at 21:44

I think it is probably best to split the lines, skipping the first line. Since you don't mind using @df and $df, neither do I:

my @df = qx(df -k /tmp);
shift @df;                # Lose df heading line
foreach my $df (@df)
{
    my($system, $size, $used, $avail, $capacity, $mount) = split / +/, $df;
    ....
}

This gives you all the fields at once. Now you just need to interpret the 'G' and lose the '%', etc.

score 1 · Answer 6 · answered Jun 14 '11 at 21:46

1

foreach my $device ( @df ) {
    next unless $device =~ m{^/};
    my( $filesystem, $size, $used, $avail, $cap, $mounted ) = split /\s+/, $device;
    # you take it from there.... ;)
}

answered Jun 14 '11 at 21:46

DavidO

13,812
3
38
66

William Pursell · Answer 7 · 2011-06-14T22:17:55.040

Lots of variations on a theme here. I would keep the first line, since it gives a nice header:

$ perl -E '$,=" "; open my $fh, "-|", "df -k /tmp"; 
  while(<$fh>) { @a=split; say @a[3,4]}'

On second thought, this is a lot cleaner:

$ df -k /tmp | perl -naE '$,="\t"; say @F[3,4]'
Available       Capacity
20862392        92%

Final thought: don't use perl at all:

$ df -h /tmp | tr -s ' ' '\t'  | cut  -f 3,4

or

$ df -h /tmp | awk '{print $3 "\t" $4}'

regex help on unix df

7 Answers7

Linked