2

Still newish to the site, but here goes... Basically I'm storing events in multiple files, with each event being a line and each line containing dates ($1), start($2) and stop($3) times and several other pieces of data. I use two double underscores ("__") as Field Separators. I've been using a variety of shell scripts to manage the data, and I was using awk to calculate stats and I'm having trouble invoking the date function so I can get a total by day of the week. After much tinkering and scanning of discussion boards I got to this:

ls /home/specified/folder/MBRS.db/* |
xargs -n 1 -I % awk -F"__" '$6 == "CLOSED" && $1 >= "'$backDATE'" { print $0 }' % |
awk 'BEGIN{FS="__"}{specDATE=system("date --date="$1" +%a")} specDATE == "Tue" {print $2" "$3}'

or

ls /home/lingotech/Einstein/data/MBRS.db/* |
xargs -n 1 -I % awk -F"__" '$6 == "CLOSED" && $1 >= "'$backDATE'" { print $0 }' % |
awk 'BEGIN{FS="__"}system("date --date="$1" +%a") == "Mon" {print $2" "$3}'`

Instead of outputting the start and stop times, I'm getting a list of all the different days of the week for each entry.

I've tried more variations of the date usage than I care to admit, including:

for y in Sun Mon Tue Wed Thu Fri Sat; do
  for directory in $( ls /home/specified/directory/MBRS.db/* | xargs -n 1 ); do
    printf "."
    [[ $( cat $directory | awk -F"__" '$6 == "CLOSED" && $1 >= "'$backDATE'" { print $1 }' | xargs -n 1 -I z date +%a -d z ) == "$y" ]] && echo BLAH
  done
done

Some helpful explanation of what I'm screwing up would be much appreciated. Thanks in advance. Oh and I'm storing the date in YYMMDD format but that doesn't seem to be an issue for ubuntu server's version of 'date'.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
Mercutio
  • 1,152
  • 1
  • 14
  • 33
  • I think you need to escape the nested double quotes in `{specDATE=system("date --date="$1" +%a")}` like this: `{specDATE=system("date --date=\"$1\" +%a")}`. At the very least, I'd be leery of that piece of syntax because the `$1` is outside quotes when the shell sees it. I think you should code in a language other than `awk` that can manage dates more directly. `awk` is great — don't get me wrong. But running `date` once per line of data is not good for performance. I'd use Perl (but I've been using Perl for twenty years); it might be more sensible for you to use Python. – Jonathan Leffler Mar 23 '15 at 06:01
  • I think the `-n 1` and `-I %` (and subsequent `%`) in the `xargs` commands are just ways of slowing the processing down; `awk` happily works with multiple files. The asymmetric ways of setting the field separator to `__` in the two `awk` scripts is odd. The second script should be combined with the first so you're only running one `awk` script. It is permissible to put an `awk` script on multiple lines. It is usually sensible to put each `pattern { action }` unit on its own line unless everything fits comfortably on a single line. And if the `{action}` itself needs multiple lines, use them. – Jonathan Leffler Mar 23 '15 at 06:09
  • I originally started out with a single awk prog, but as I had issues I used the pipes and xargs to try and separate out what I was doing to see the error. When I use the following, for example, I can print out the day of the week, but I'm having trouble using that for any kind of conditional check (i.e. is this a tuesday): ls /home/absolute/path/* | xargs -n 1 -I % awk -F"__" '$6 == "CLOSED" && $1 >= "'$backDATE'" { print $0 }' % | awk 'BEGIN{FS="__"} $4 == "SJ" {system("date -d"$1" +%a")}' – Mercutio Mar 23 '15 at 06:34
  • awk -F "__" -v newDAY="date -d "$1" +%a" '$6 == "CLOSED" && $1 >= "'150000'" {print newDAY}' /home/lingotech/Einstein/data/MBRS.db/* ....something like this for example will pass newDAY as the string "date -d "$1" +%a" but I'd love it to run the date command and store its value instead – Mercutio Mar 23 '15 at 06:48
  • If I put it all together like this: awk -F "__" 'system("date +%a -d "$1"") == "Tue" && $1 >= 150000 && NF == 10 && $4 == "SJ" {print $2" "$3}' /home/abs/path/* I can't get the system() to trigger a condition (also tried ~ and =~). – Mercutio Mar 23 '15 at 07:48
  • 1
    Actually, if you are using Ubuntu, `gawk` has `strftime` which would simplify things. Current `mawk` (in contrast to the version in Ubuntu) also supports `strftime`. – Thomas Dickey Mar 23 '15 at 08:30

2 Answers2

2

I don't know about all the rest of it (too much text for my reading tastes!) but wrt the answer you posted, this part of it:

awk 'BEGIN{FS="__"} NF == 10 && $1 >= "'$backDATE'" && $4 == "'$x'" && $6 == "CLOSED" {while ( "date +%a -d "$1"" | getline newDAY){if (newDAY == "'$y'") print $2" "$3}}' /home/absolute/path/*

assuming it does what you want would be written as:

awk -v backDATE="$backDATE" -v x="$x" -v y="$y" '
    BEGIN { FS="__" }
    (NF == 10) && ($1 >= backDATE) && ($4 == x) && ($6 == "CLOSED") {
        cmd = "date +%a -d \"" $1 "\""
        while ( (cmd | getline newDAY) > 0 ) {
            if (newDAY == y) {
                print $2, $3
            }
        }
        close(cmd)
    }
' /home/absolute/path/*

wrt why use awk variables instead of letting shell variables expand to become part of the body of a shell script, the answer is robustness and simplicity.

This is letting a shell variable expand to become part of the body of an awk script:

$ x="hello world"
$ awk 'BEGIN{ print '$x' }'
awk: cmd. line:1: BEGIN{ print hello
awk: cmd. line:1:                   ^ unexpected newline or end of string
$ awk 'BEGIN{ print "'$x'" }'
awk: cmd. line:1: BEGIN{ print "hello
awk: cmd. line:1:              ^ unterminated string
awk: cmd. line:1: BEGIN{ print "hello
awk: cmd. line:1:              ^ syntax error
$ awk 'BEGIN{ print "'"$x"'" }'
hello world
$ x="hello
world"
$ awk 'BEGIN{ print "'"$x"'" }'
awk: cmd. line:1: BEGIN{ print "hello
awk: cmd. line:1:              ^ unterminated string
awk: cmd. line:1: BEGIN{ print "hello
awk: cmd. line:1:              ^ syntax error

and this is using an awk variable initialized with the value of a shell variable:

$ x="hello world"
$ awk -v x="$x" 'BEGIN{ print x }'
hello world

$ x="hello
world"
$ awk -v x="$x" 'BEGIN{ print x }'
hello
world

See the difference?

As for why store the command in a variable - because you have to close it after you use it and it must be spelled exactly the same way in the close command as it was when you opened the pipe. Compare:

cmd = "date +%a -d \"" $1 "\""
cmd | getline
close(cmd)

vs:

"date +%a -d \"" $1 "\"" | getline
close("date +%a -d \"" $l "\"")

and take an extremely close second look to spot the bug in the 2nd version.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • I set backDATE, x and y outside of the awk program (and use them elsewhere), so what is the advantage of using -v instead of breaking out with "'$x'" to expand the variable? Also wondering the same thing about using the cmd variable instead of just writing the command out inside condition statement? Not trying to be particular but I'm just trying to figure out if one way is more efficient for awk to read or if it's just easier for humans, etc... – Mercutio Mar 23 '15 at 19:42
  • I just edited my answer to add explanations for why to use variables and why to use a cmd variable. – Ed Morton Mar 23 '15 at 20:51
  • 1
    Thanks that totally cleared things up. Using -v to make awk specific versions of the variable definitely would have saved me countless hours hunting down rogue quote marks, thanks for following up with that explanation. Plus I totally missed close() earlier, thanks again! – Mercutio Mar 23 '15 at 21:11
0

Ok, so I ended up using this:

>backDATE=150000; 
>     for x in $listOFsites; do
>        for y in Sun Mon Tue Wed Thu Fri Sat; do
>            totalHOURS=$( awk 'BEGIN{FS="__"} NF == 10 && $1 >= "'$backDATE'" && $4 == "'$x'" && $6 == "CLOSED" {while ( ( "date +%a -d \""$1"\"" | getline newDAY) > 0 ){if (newDAY == "'$y'") print $2" "$3}}' /home/absolute/path/* | xargs -I % /home/custom/duration/calc % | paste -sd+ | bc ); printf "."; 
>        done
>     done

I had to use date inside the single quotes (so that I could pass $1 to it) rather than outside (using -F"__" -v newDAY=...), but inside the single quotes getting the output of system() is problematic. After seeing:How can I pass variables from awk to a shell command? I finally saw the while (cmd | get line x) format which was the crux of my issue. Props to Ed Morton

Community
  • 1
  • 1
Mercutio
  • 1,152
  • 1
  • 14
  • 33
  • Don't blame me :-) - that is NOT the way to write that command! – Ed Morton Mar 23 '15 at 15:38
  • Added the \" during in the date segment and adjusted while statement but it still performs the same way. If you could explain 'why' a little, it would really help me understand what I'm doing wrong. Just using posted code is kinda what got me here :) – Mercutio Mar 23 '15 at 19:47