Calculate time differences from different columns - bash

Question

I have seen many answers, but none has resolved my issue. Here is my fine

cat run_time

Done  City        Start_time              End_time  
Yes   Chicago     10:16:51,14-Dec-2018   10:19:38,14-Dec-2018        
Yes   Atlanta     10:12:58,14-Dec-2018   10:20:58,14-Dec-2018               
No    Minnetonka  10:16:38,14-Dec-2018   10:21:50,14-Dec-2018        
Yes   Hopkins     10:22:20,14-Dec-2018   10:18:11,14-Dec-2018

When I can calculate the manually everything works.

TO=$(date -d "10:16:58 14-Dec-2018" +%s)
TAL=$(date -d "10:16:50 14-Dec-2018" +%s)
TOTAL=$(( "$TO" - "$TAL" ))
echo $TOTAL

However, I'm getting an error whenever I tried to integrate into awk function.

First, I removed the comma between the time and date.

sed -i -e 's/,/ /g' run_time

Done  City        Start_time              End_time  
Yes   Chicago     10:16:51 14-Dec-2018   10:19:38 14-Dec-2018                
Yes   Atlanta     10:12:58 14-Dec-2018   10:20:58 14-Dec-2018               
No    Minnetonka  10:16:38 14-Dec-2018   10:21:50 14-Dec-2018        
Yes   Hopkins     10:22:20 14-Dec-2018   10:18:11 14-Dec-2018

Running the following awk command displays the information:

awk 'BEGIN { OFS = "\t" } NR == 1 { $7 = "Time_diff" } NR >= 2 { $7 = "$3,$4" - "$5,$6" } 1' < run_time|column -t

Done  City        Start_time              End_time               Time_diff
Yes   Chicago     10:16:51 14-Dec-2018  10:19:38 14-Dec-2018  
Yes   Atlanta     10:12:58 14-Dec-2018  10:20:58 14-Dec-2018  
No    Minnetonka  10:16:38 14-Dec-2018  10:21:50 14-Dec-2018  
Yes   Hopkins     10:22:20 14-Dec-2018  10:18:11 14-Dec-2018

My goal is to calculate the time difference and added under Time_diff

see this. maybe works: https://stackoverflow.com/questions/53810634/how-to-select-the-last-value-when-using-group-by — mahradbt, Dec 17 '18 at 08:28
This time I did the formatting for you. You can click on `edit` to see how the current format was achieved. — Socowi, Dec 17 '18 at 08:28
Please add your desired output for that sample input to your question. — Cyrus, Dec 17 '18 at 08:42
I can't see the 'integration' - in 'manually' you use `date` to convert to epoch time, in the `awk` script there is no such conversion... — liborm, Dec 17 '18 at 08:52

score 2 · Accepted Answer · edited Dec 17 '18 at 16:14

Considering that your Input_file is:

cat Input_file
Done  City        Start_time              End_time
Yes   Chicago     10:16:51,14-Dec-2018   10:19:38,14-Dec-2018
Yes   Atlanta     10:12:58,14-Dec-2018   10:20:58,14-Dec-2018
No    Minnetonka  10:16:38,14-Dec-2018   10:21:50,14-Dec-2018
Yes   Hopkins     10:22:20,14-Dec-2018   10:18:11,14-Dec-2018

Which rules does the script follow:

This code should take care of which DATE COLUMN is having greater value than other it will take the difference in that manner. Eg. last column's time is greater than 2nd last column then it will do last_col_time-second_last_col_time else it will do vice versa.
I have changed 14-Dec-2018 month to all smaller letter so even they are in any form (small, capital, or mix) we should be good at it.
I have NOT hardcoded 3rd and 4th column values in code, since 2nd column (which is city) may have spaces in between city names, so rather I have taken column values from last fields like $(NF-1) (second last column) and $NF (last column value).

This is all done as:

awk '
BEGIN{
  num=split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec",month,",")
  for(i=1;i<=12;i++){
    a[month[i]]=i
  }
}
FNR==1{
  print $0,"Time_diff"
  next
}
{
  split($(NF-1),array,"[:,-]")
  split($(NF),array1,"[:,-]")
  val=mktime(array[6]" "a[tolower(array[5])]" "array[4]" "array[1]" "array[2]" "array[3])
  val1=mktime(array1[6]" "a[tolower(array1[5])]" "array1[4]" "array1[1]" "array1[2]" "array1[3])
  delta=val>=val1?val-val1:val1-val
  hrs = int(delta/3600)
  min = int((delta - hrs*3600)/60)
  sec = delta - (hrs*3600 + min*60)
  printf "%s\t%02d:%02d:%02d\n", $0, hrs, min, sec
  hrs=min=sec=delta=""
}
'  Input_file  | column -t

Output will be as follows.

Done  City        Start_time            End_time              Time_diff
Yes   Chicago     10:16:51,14-Dec-2018  10:19:38,14-Dec-2018  00:02:47
Yes   Atlanta     10:12:58,14-Dec-2018  10:20:58,14-Dec-2018  00:08:00
No    Minnetonka  10:16:38,14-Dec-2018  10:21:50,14-Dec-2018  00:05:12
Yes   Hopkins     10:22:20,14-Dec-2018  10:18:11,14-Dec-2018  00:04:09

Explanation of above code: Sorry, we need to scroll on the right side here.

awk '                                                                                               ##Starting awk code here.
BEGIN{                                                                                              ##Mentioning BEGIN section of awk here.
  num=split("jan,feb,mar,apr,may,jun,jul,aug,sep,oct,nov,dec",month,",")                            ##Creating month array which have months value in it.
  for(i=1;i<=12;i++){                                                                               ##Starting a for loop for covering 12 months.
    a[month[i]]=i                                                                                   ##Creating an array a whose index is month value and value is i.
  }                                                                                                 ##Closing for loop block here.
}                                                                                                   ##Closing BEGIN section block here.
FNR==1{                                                                                             ##Checking if this is first line.
  print $0,"Time_diff"                                                                              ##Printing current line with string Time_diff here.
  next                                                                                              ##next will skip all further statements from here.
}                                                                                                   ##Closing FNR conditoin block here.
{
  split($(NF-1),array,"[:,-]")                                                                      ##Splitting 2nd last column to array named array.
  split($(NF),array1,"[:,-]")                                                                       ##Splitting last column to array with delimietr as : or , or -
  val=mktime(array[6]" "a[tolower(array[5])]" "array[4]" "array[1]" "array[2]" "array[3])           ##Creating val which have mktime value,passing array elements.
  val1=mktime(array1[6]" "a[tolower(array1[5])]" "array1[4]" "array1[1]" "array1[2]" "array1[3])    ##Creating val1 variable by mktime passing array1 elements.
  delta=val>=val1?val-val1:val1-val                                                                 ##getting diff of val and val1 depending upon highest-lowest value
  hrs = int(delta/3600)                                                                             ##getting diff in hours if any.
  min = int((delta - hrs*3600)/60)                                                                  ##getting diff in min if any.
  sec = delta - (hrs*3600 + min*60)                                                                 ##getting diff in seconds value.
  printf "%s\t%02d:%02d:%02d\n", $0, hrs, min, sec                                                  ##Printing line and value of hrs,min and sec values here.
  hrs=min=sec=delta=""                                                                              ##Nullifying variables values here.
}
'  Input_file | column -t                                                              ##Mentioning Input_file and passing it to column command for TAB format in output.

@Kwa Arboncana, please try this solution and let me know how it goes. — RavinderSingh13, Dec 17 '18 at 13:26

score 2 · Answer 2 · answered Dec 17 '18 at 15:18

$ cat input.txt 
Done  City        Start_time              End_time  
Yes   Chicago     10:16:51,14-Dec-2018   10:19:38,14-Dec-2018        
Yes   Atlanta     10:12:58,14-Dec-2018   10:20:58,14-Dec-2018               
No    Minnetonka  10:16:38,14-Dec-2018   10:21:50,14-Dec-2018        
Yes   Hopkins     10:22:20,14-Dec-2018   10:18:11,14-Dec-2018



$ cat diff_time.awk
BEGIN{
    print "Done City Start_time End_time Time_diff"
}
{
    if(!/^Do/){
        diff_time=0
        start_full=substr($3,1,8)" "substr($3,10,11)
        end_full=substr($4,1,8)" "substr($4,10,11)
        "date -d "q start_full q" +%s"|getline start_epoc
        "date -d "q end_full q" +%s"|getline end_epoc
        diff_time= end_epoc - start_epoc
        if(diff_time<0){
            diff_time=diff_time*-1
        }
        "date -d@"diff_time" -u +%H:%M:%S"|getline diff_h
        print $0,diff_h
        }
}



$ awk -v q='"' -f diff_time.awk input.txt |column -t
Done  City        Start_time            End_time              Time_diff
Yes   Chicago     10:16:51,14-Dec-2018  10:19:38,14-Dec-2018  00:02:47
Yes   Atlanta     10:12:58,14-Dec-2018  10:20:58,14-Dec-2018  00:08:00
No    Minnetonka  10:16:38,14-Dec-2018  10:21:50,14-Dec-2018  00:05:12
Yes   Hopkins     10:22:20,14-Dec-2018  10:18:11,14-Dec-2018  00:04:09

@downtheroad Not sure if you could answer this. What if I had a value called "Failed" or "Null" in the end_time column and would like to skip the time operation? Want to see a blank on Time_Diff or a 0. Did not want to start a whole new question. — Kwa Arboncana, Jan 14 '19 at 19:18
@KwaArboncana for that specific case, after end_full=substr... line add: if(length(end_full) != 20){ print $0,"0" next } — downtheroad, Jan 14 '19 at 20:37

francesco · Answer 3 · 2018-12-17T09:40:06.140

1

A small script can do that

#!/bin/bash
(
TOTAL=0
while read -r line
do
  if [ "`echo $line|grep ^Done`" != "" ]
  then
    echo "$line"
  else
    TO=$(date -d "`echo $line|tr -s " "|cut -d " " -f 3|tr "," " "`" +%s)
    TAL=$(date -d "`echo $line|tr -s " "|cut -d " " -f 4|tr "," " "`" +%s)
    SUBTOTAL=$(( $TO - $TAL ))
    echo "$line $SUBTOTAL"
    TOTAL=$(( $TOTAL + $SUBTOTAL ))
  fi
done
echo $TOTAL
) <run_time

Explanation: the script reads each line of run_time into the variable line. A line starting with Done gets simply printed (the first line of the table). For all other lines, you eliminate double spaces (tr -s " "), you extract the third (cut -d " " -f 3) or the fourth (cat -d " " -f 4) field, then you substitute , with , and use the same formula you gave to compute the start/end dates and difference. Finally you print the difference next to the line. At the same time you store in TOTAL the sum of all differences, and you print it in the end.

edited Dec 17 '18 at 09:40

answered Dec 17 '18 at 08:59

francesco

7,189
7
22
49

1

The needlessly complex field splitting, the [useless `cat`](/questions/11710552/useless-use-of-cat), the lack of quoting, the uppercase variable names, and the haphazard "Done" processing are problematic here. – tripleee Dec 17 '18 at 09:27
@tripleee thanks for comments, I improved the script. I kept uppercase variable names to match closely the question. – francesco Dec 17 '18 at 09:40
1

You could still avoid the fugly splitting by letting `read` pick apart the line. You will lose the `Done` handling but just reprinting the column headers and discarding the first line will make the entire script significantly simpler. Still you need to remember that `while read -r` is very inefficient; I would definitely go with Awk for this. – tripleee Dec 17 '18 at 09:44

NeronLeVelu · Answer 4 · 2018-12-17T09:26:58.567

using gawk (not a posix that don't have time function)

Self explain code

awk '
   function convert2time ( ArgStrHr ) {
      # mktime format used "YYYY MM DD HH MM SS [DST]"
      # time format provided "10:16:51,14-Dec-2018"
      # extract element in a array
      T=split( ArgStrHr, aElt, /[-: ,]/ )

      # return the conversion
      return mktime( sprintf( "%4d %2d %2d %2d %2d %2d", aElt[6], aMonth[ aElt[5] ], aElt[4], aElt[1], aElt[2], aElt[3] ) )
      }

   BEGIN {
      # For string month convertion used in convert function
      split( "Jan Fev Mar Apr May Jun Jul Aug Sep Oct Nov Dec", aTemp )
       # revert a[i]="month" in a["month"]=i
      for ( Idx in aTemp ) aMonth[ aTemp[ Idx] ] = Idx
      }

   FNR==1 { $(NF + 1) = "Difference" }

   FNR!=1 {
      # take time in coutable form
      T1 = convert2time( $3 )
      T2 = convert2time( $4 )
      # add a field with difference
      $(NF + 1) =  T2 - T1
      }

   # print lines
   1    
   ' YourFile

Ed Morton · Answer 5 · 2018-12-17T19:22:24.870

idk what it means for the end time to be earlier than the start time but this using GNU awk for time functions indicates it in the output with a leading "-" on the time diff:

$ cat tst.awk
BEGIN { OFS="\t" }
NR==1 {
    print $0, "Time_diff"
    next
}
{
    for (i=NF-1; i<=NF; i++) {
        split($i,t,/[:,-]/)
        t[5] = (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[5])+2)/3
        secs[i] = mktime(t[6]" "t[5]" "t[4]" "t[1]" "t[2]" "t[3])
    }

    sign = " "
    totSecsDiff = secs[NF] - secs[NF-1]
    if (totSecsDiff < 0) {
        sign = "-"
        totSecsDiff = 0 - totSecsDiff
    }

    hrsDiff  = int(totSecsDiff / (60*60))
    minsDiff = int((totSecsDiff - (hrsDiff*60*60)) / 60)
    secsDiff = totSecsDiff - (hrsDiff*60*60 + minsDiff*60)
    hmsDiff  = sprintf("%s%02d:%02d:%02d",sign,hrsDiff,minsDiff,secsDiff)

    print $0, hmsDiff
}

$ awk -f tst.awk file
Done  City        Start_time              End_time      Time_diff
Yes   Chicago     10:16:51,14-Dec-2018   10:19:38,14-Dec-2018    00:02:47
Yes   Atlanta     10:12:58,14-Dec-2018   10:20:58,14-Dec-2018    00:08:00
No    Minnetonka  10:16:38,14-Dec-2018   10:21:50,14-Dec-2018    00:05:12
Yes   Hopkins     10:22:20,14-Dec-2018   10:18:11,14-Dec-2018   -00:04:09

Ed, I like your solution. Would you please add some explanations? Only this section: for (i=NF-1; i<=NF; i++) { split($i,t,/[:,-]/) t[5] = (index("JanFebMarAprMayJunJulAugSepOctNovDec",t[5])+2)/3 secs[i] = mktime(t[6]" "t[5]" "t[4]" "t[1]" "t[2]" "t[3]) } — Kwa Arboncana, Dec 20 '18 at 17:33
It's just looping through the last 2 fields looking up the month abbreviation from the field (e.g. "Dec") in the string of all months to get it's index in that string and then doing some trivial math `(result+2)/3` to convert that index to a month number before calling mktime() to convert the date and time stored in each field to a number of seconds since the epoch and saving the result in an array indexed by the field number. — Ed Morton, Dec 20 '18 at 19:21

score 1 · Answer 6 · answered Dec 18 '18 at 21:04

Using Perl core modules

> cat kwa_time.in
Done  City        Start_time              End_time
Yes   Chicago     10:16:51,14-Dec-2018   10:19:38,14-Dec-2018
Yes   Atlanta     10:12:58,14-Dec-2018   10:20:58,14-Dec-2018
No    Minnetonka  10:16:38,14-Dec-2018   10:21:50,14-Dec-2018
Yes   Hopkins     10:22:20,14-Dec-2018   10:18:11,14-Dec-2018
> cat ./time_diff.sh
perl  -lane '
BEGIN {
use POSIX;
use Time::Local;
@months = qw( Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec );
}
print "$_\tTime_diff" if $.==1;
if($.>1)
{
$dte="$F[-1]";
$dts="$F[-2]";
$dte=~s/(\d+):(\d+):(\d+),(\d+)-(\S+)-(\d+)/timelocal($3,$2,$1,$4,(grep { $5 eq $months[$_] } 0..$#months)[0],$6-1900)/ge;
$dts=~s/(\d+):(\d+):(\d+),(\d+)-(\S+)-(\d+)/timelocal($3,$2,$1,$4,(grep { $5 eq $months[$_] } 0..$#months)[0],$6-1900)/ge;
$diff = abs($dte-$dts);
$hd=int $diff/3600;
$md=int (($diff-($hd*3600))/60);
$sd=int ($diff - ($hd*3600+$md*60));
printf("%s %02d:%02d:%02d\n",join(" ",@F),$hd,$md,$sd);
}
' $1
> ./time_diff.sh kwa_time.in | column -t
Done  City        Start_time            End_time              Time_diff
Yes   Chicago     10:16:51,14-Dec-2018  10:19:38,14-Dec-2018  00:02:47
Yes   Atlanta     10:12:58,14-Dec-2018  10:20:58,14-Dec-2018  00:08:00
No    Minnetonka  10:16:38,14-Dec-2018  10:21:50,14-Dec-2018  00:05:12
Yes   Hopkins     10:22:20,14-Dec-2018  10:18:11,14-Dec-2018  00:04:09
>

Calculate time differences from different columns - bash

6 Answers6