0

Let's say I have a file like the following:

1,aaa,2016-12-01 01:02:03 EST,bbb
2,ccc,2016-12-02 04:05:06 CST,ddd
3,eee,2016-12-03 07:08:09 EST,fff

I want to add a 5th field, which is the timestamp in field 3, but converted to UTC.

This can invoke a call to date or whatever within Linux. I'm not too worried about performance, as it is called only once very so often on a small amount of files.

I just can't seem to figure out the best way to do this. awk's strftime doesn't receive a timezone field, so I can't figure out the best way to use it.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
jasonmclose
  • 1,667
  • 4
  • 22
  • 38

2 Answers2

2

With a Bash loop:

while IFS=, read -r -a linearr; do
    printf '%s,' "${linearr[@]}"
    date +'%F %T %Z' -u -d "${linearr[2]}"
done < infile

This results in

1,aaa,2016-12-01 01:02:03 EST,bbb,2016-12-01 06:02:03 UTC
2,ccc,2016-12-02 04:05:06 CST,ddd,2016-12-02 10:05:06 UTC
3,eee,2016-12-03 07:08:09 EST,fff,2016-12-03 12:08:09 UTC

This reads each line into an array linearr, prints the line with an added comma at the end and then appends the new date string.

Alternatively, with paste and cut:

paste -d, infile <(while read line; do date +'%F %T %Z' -u -d "$line"; done < <(cut -d , -f 3 infile))

or, more readable:

paste -d , \
    infile \
    <(while read line; do
          date +'%F %T %Z' -u -d "$line"
      done < <(cut -d, -f3 infile)
    )

Notice that the -d option to date is specific to GNU date. POSIX date has no option to use a date other than the current system date, and date in FreeBSD uses another option, -r, which expects "seconds since the Epoch" as its argument.

Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
  • 1
    And a note with, works only on `GNU date` would be helpful, `++` – Inian Jan 03 '17 at 05:24
  • This works well, but it just prints out the line. I guess I should've been more specific. How can I then get the lines in the file to reflect the new change? So after this runs, the lines in the file should each have the newly appended timestamp. – jasonmclose Jan 04 '17 at 16:09
  • @jasonmclose You redirect to a temporary file and rename it afterwards: `while ... done < infile > outfile && mv outfile infile` or `paste -d, infile <(...) > outfile && mv outfile infile`. – Benjamin W. Jan 04 '17 at 16:28
0

Using GNU awk, you can do this by manipulating the TZ environment variable. The format is a bit nasty when you are dealing with time-offsets only, but if you have the std-string that specifies the time-zone, it is straightforward.

In awk, the environment is stored in the array ENVIRON. Modificaion of ENVIRON is implementation defined:

ENVIRON: An array representing the value of the environment, as described in the exec functions defined in the System Interfaces volume of POSIX.1-2017. The indices of the array shall be strings consisting of the names of the environment variables, and the value of each array element shall be a string consisting of the value of that variable. If appropriate, the environment variable shall be considered a numeric string (see Expressions in awk); the array element shall also have its numeric value.

In all cases where the behaviour of awk is affected by environment variables (including the environment of any commands that awk executes via the system function or via pipeline redirections with the print statement, the printf statement, or the getline function), the environment used shall be the environment at the time awk began executing; it is implementation-defined whether any modification of ENVIRON affects this environment.

source: POSIX.1-2017

GNU awk, on the other hand, states the following:

However, beginning with version 4.2, if not in POSIX compatibility mode, gawk does update its own environment when ENVIRON is changed, thus changing the environment seen by programs that it creates.

So this can now be exploited by doing somethinglike this:something like:

ENVIRON["TZ"] = std offset

Here are a couple of examples:

ENVIRON["TZ"] = "UTC"
ENVIRON["TZ"] = "UTC+03:00"
ENVIRON["TZ"] = "CET"

With respect to the OP, we can do this:

awk 'BEGIN{FS=OFS=","}
     {time=$2; gsub(/[^0-9]/," ",time); tz=$2; gsub(/^.* /,"",tz)}
     {ENVIRON["TZ"]=tz; print $0,strftime("%F %T",mktime(time),1)}
    ' file

Note: for CSV files, you should use What's the most robust way to efficiently parse CSV using awk?

Note: sadly, not all known forms of TZ definitions are recognized.

kvantour
  • 25,269
  • 4
  • 47
  • 72