AWK matching values in a column and performing calculation

Question

I'm new at AWK and I'm trying to figure out an answer for my problem. I have a flat file with the following values:

403 | SanMateo   | f | 2015-04-09 18:50:24.38
403 | SanMateo   | t | 2015-04-09 18:45:24.36
403 | SanMateo   | t | 2015-04-09 18:40:24.383
403 | SanMateo   | f | 2015-04-09 18:35:24.357
403 | SanMateo   | t | 2015-04-09 18:30:24.355
404 | RedwoodCity| f | 2015-04-09 18:35:50.308
404 | RedwoodCity| t | 2015-04-09 18:30:50.242
404 | RedwoodCity| f | 2015-04-09 18:25:50.245
404 | RedwoodCity| t | 2015-04-09 18:20:50.242
404 | RedwoodCity| f | 2015-04-09 18:15:50.242

I want to use awk to compare $1 of the current line to $1 of the next line, and $3 ~/f/. if the statement is true then subtract $4 of the next line from $4 of the current line and write the difference in a new column of the current line and if false then do nothing. what I have so far is this:

awk 'BEGIN {FS="|";} {if (NR $1 ~ NR++ $1 && $3 ~ /f/) subtract = NR $4 - NR++ $4; {print subtract}}' allHealthRecords_Sorted

and obviously that's not working. Can someone please help?

Don't think about it in terms of wanting to do something with the next line, think about it in terms of doing something with the previous line. You can't look at the next line as you haven't read it yet but you HAVE read the previous line. Think about it that way and the solution will come to you. — Ed Morton, Apr 09 '15 at 21:34

glenn jackman · Accepted Answer · 2015-04-09T22:01:46.973

save this as time_diff.awk

BEGIN {FS = "[[:blank:]]*\\|[[:blank:]]*"}

# convert "YYYY-mm-dd HH:MM:SS.fff" to a number
function to_time(timestamp,       fraction) {
    fraction = timestamp
    sub(/\..*$/, "", timestamp)
    gsub(/[-:]/, " ", timestamp)
    sub(/.*\./, "0.", fraction)
    return mktime(timestamp) + fraction
}

# gawk has no builtin abs() function
function abs(val) { 
    return( val < 0 ? -1*val : val) 
}

# add the time diff if the condition is met
NR > 1 {
    diff = 0
    if ($1+0 == key && flag == "f") 
        diff = abs( to_time($4) - to_time(time) )
    print line (diff > 0 ? " | " diff : "")
} 

{
    # remember the previous line's values
    key = $1+0; flag = $3; time = $4; line = $0
}

END {print}

Then

$ gawk -f time_diff.awk file
        403 | SanMateo| f                | 2015-04-09 18:50:24.38 | 300.02
        403 | SanMateo| t                | 2015-04-09 18:45:24.36
        403 | SanMateo| t                | 2015-04-09 18:40:24.383
        403 | SanMateo| f                | 2015-04-09 18:35:24.357 | 300.002
        403 | SanMateo| t                | 2015-04-09 18:30:24.355
        404 | RedwoodCity| f                | 2015-04-09 18:35:50.308 | 300.066
        404 | RedwoodCity| t                | 2015-04-09 18:30:50.242
        404 | RedwoodCity| f                | 2015-04-09 18:25:50.245 | 300.003
        404 | RedwoodCity| t                | 2015-04-09 18:20:50.242
        404 | RedwoodCity| f                | 2015-04-09 18:15:50.242

I could have used the unary `-` operator in the `abs` function: `return (val<0 ? -val : val)` — glenn jackman, Apr 10 '15 at 15:28

score 1 · Answer 2 · answered Apr 09 '15 at 21:55

You don't show your expected output so we can't test it, and $4 is a date so idk what you mean by "subtract" but this is basically the right approach:

$ cat tst.awk         
BEGIN{ FS="[[:space:]]*[|][[:space:]]*"; OFS=" | " }
split(prev,p) { print prev ( ($1==p[1])&&(p[3]=="f") ? OFS p[4] - $4 : "") }
{ prev = $0 }
END { print prev ( ($1==p[1])&&(p[3]=="f") ? OFS p[4] - $4 : "") }

$ awk -f tst.awk file
403 | SanMateo   | f | 2015-04-09 18:50:24.38 | 0
403 | SanMateo   | t | 2015-04-09 18:45:24.36
403 | SanMateo   | t | 2015-04-09 18:40:24.383
403 | SanMateo   | f | 2015-04-09 18:35:24.357 | 0
403 | SanMateo   | t | 2015-04-09 18:30:24.355
404 | RedwoodCity| f | 2015-04-09 18:35:50.308 | 0
404 | RedwoodCity| t | 2015-04-09 18:30:50.242
404 | RedwoodCity| f | 2015-04-09 18:25:50.245 | 0
404 | RedwoodCity| t | 2015-04-09 18:20:50.242
404 | RedwoodCity| f | 2015-04-09 18:15:50.242

i.e. you have a buffer of 1 line so you're always operating on and outputing the previous line that you read.

Forgive me for not showing the expected results. What I want is to figure out the difference in time every time a site changes it's status from t to f and vice versa. From there, I can perform a calculation based on those time and produce a report. I figure to chop the information into pieces so that it is easy to digest for the reader of the report. The problem is, the flat files includes every site and I want to automate the calculation and have the automation smart enough to know when not to include the information it's reading. I'm running out of space I just want to say thnx 4 D hlp. — NOOBIE, Apr 10 '15 at 15:14

score 0 · Answer 3 · answered Apr 09 '15 at 21:19

0

In the BEGIN action, read the first line with getline and save the values of $1 and $4.

On each line thereafter, compare $1 to the saved value from the previous line. If they are the same, and $3 ~ /f/, do the desired process. Then save the values of $1 and $4 for the next line.

That should be enough to get you started. If you have trouble writing the code, come back and ask more questions.

answered Apr 09 '15 at 21:19

user448810

17,381
4
34
59

No. Do not follow this advice as using getline in BEGIN is completely unnecessary for this and fraught with caveats, see http://awk.info/?tip/getline. – Ed Morton Apr 09 '15 at 21:36
Well, I don't think this is particularly wrong, but if you prefer, you could read and store the values from the first line in an `NR == 1` pattern. The rest of the algorithm, on remaining lines after the first, is correct. – user448810 Apr 10 '15 at 12:31
Telling someone to use getline for this is about as wrong as telling them to use `head` to get the first line in a variable and then `tail` the rest in a pipe to awk, but getline has more insidious caveats/gotchas. There are a few very specific problems for which getline is the right solution, for everything else don't use it and just let awk do what it does naturally instead. – Ed Morton Apr 10 '15 at 15:04

AWK matching values in a column and performing calculation

3 Answers3