0

I have the following file:

cat x.txt 

2020-01-04
2020-01-01
2008-03-12
2021-08-09

I am trying to write an awk script that outputs the following:

2020-01-04  2022-03-09  795
2020-01-01  2022-03-09  798
2008-03-12  2022-03-09  5110
2021-08-09  2022-03-09  212

Where column 2 is the current date and column 3 is the difference between column 1 and 2 in days. I have started a script like this but not really getting it:

cat y

#!/usr/bin/env bash
  
awk '

BEGIN {
    FS = OFS = "\t"
    str = "date +%Y-%m-%d"
    str | getline date
    d2 = mktime((gensub(/-/, " ", "g", date)) " 0 0 0")
    d1 = mktime((gensub(/-/, " ", "g", $1)) " 0 0 0")
}

{
    print $1, date, d2 - d1
}

' "${@:--}"

When I run this I get the following:

./y x.txt 

2020-01-04  2022-03-09  1646737201
2020-01-01  2022-03-09  1646737201
2008-03-12  2022-03-09  1646737201
2021-08-09  2022-03-09  1646737201

I am not sure how to work with dates so help is much appreciated.

Christian Hick
  • 401
  • 3
  • 10
  • This might help: [How to find the difference in days between two dates?](https://stackoverflow.com/q/4946785/3776858) – Cyrus Mar 08 '22 at 21:31
  • Thanks Cyrus. But is there an awk solution rather than using bash as I have more than 10,000 records to work with. – Christian Hick Mar 08 '22 at 21:37

1 Answers1

2

Using GNU awk for time functions:

$ cat tst.awk
BEGIN {
    OFS = "\t"
    today = strftime("%Y-%m-%d")
    today_secs = mktime(gensub(/-/," ","g",today) " 12 0 0")
    secs_in_day = 24 * 60 * 60
}
{
    secs = mktime(gensub(/-/," ","g",$1) " 12 0 0")
    print $1, today, int( (today_secs - secs) / secs_in_day )
}

$ awk -f tst.awk x.txt
2020-01-04      2022-03-08      794
2020-01-01      2022-03-08      797
2008-03-12      2022-03-08      5109
2021-08-09      2022-03-08      211

It's currently 3/8 rather than 3/9 in my timezone hence the numbers of days being 1 less than the expected output in the question.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
  • Thanks Ed, spot on as usual. I do have a question though. Why mktime(...) " 12 0 0" ? What does " 12 0 0" do? – Christian Hick Mar 08 '22 at 22:23
  • 2
    That's the time: 12:00:00 or noon. That typically helps avoid any nasty daylight saving time problems for timezones that transition at midnight (where once per year, the time 00:00:00 does not exist). That's not strictly needed in awk, where times do not have timezones. – glenn jackman Mar 08 '22 at 22:28
  • 1
    @glennjackman correct but I'm not sure what you mean by `where times do not have timezones` - by default `strfime()` and `mktime()` use the timezone defined by the shell `TZ` environment variable, for example `awk 'BEGIN {print strftime("%F %T")}'` outputs `2022-03-08 16:43:20` for my TZ but `TZ=UTC awk 'BEGIN {print strftime("%F %T")}'` outputs `2022-03-08 22:43:17`. – Ed Morton Mar 08 '22 at 22:44
  • Ah, that's good. It doesn't work quite the same as GNU date: this throws an error `TZ='Canada/Eastern' date -d '2022-03-13 02:00:01' '+%F %T'`; this returns a valid timestamp `TZ='Canada/Eastern' awk 'BEGIN {print strftime("%F %T", mktime("2022 03 13 2 0 1"))}'` – glenn jackman Mar 09 '22 at 02:37
  • 1
    @glennjackman I just realized you're using a time that's during the period when a DST change adds an hour at 2am so 2:00:01 is a time that can't exist in your timezone. I scratched my head over that `date` failure for an hour last night and this morning! Yes, `mktime()` always tries to calculate a valid timestamp from whatever you pass in, e.g. if I try to use the 35th day of March with `awk 'BEGIN {print strftime("%F %T", mktime("2022 03 35 0 0 0"))}'` it assumes I meant 35 days from the start of March and so outputs `2022-04-04 00:00:00`. That has it's uses at times. – Ed Morton Mar 09 '22 at 12:50
  • 1
    Yes, timezones are fun. – glenn jackman Mar 09 '22 at 14:46