-3

How to get latest LoginDateTime based on second and third field (userID & userRole)? Sample input and expected output as per below. Total records per file < 5000 records.

Input.csv

AppName,UserID,UserRole,LoginDateTime
App1,googleservice,Power User,"Jul 10, 2022 16:02:41 UTC"
App1,googleservice,Power User,"Jul 9, 2022 16:02:41 UTC"
App1,googleservice,User,"August 9, 2022 16:02:41 UTC"
App1,googleservice,User,"August 10, 2022 16:02:41 UTC"
App1,pretty.zinta@google.com,Service Administrator,"August 1, 2022 05:05:47 UTC"
App1,pretty.zinta@google.com,Service Administrator,"August 2, 2022 07:07:47 UTC"
App1,pretty.zinta@google.com,Viewer,"July 11, 2022 06:01:20 UTC"
App1,pretty.zinta@google.com,Viewer,"July 14, 2022 10:01:20 UTC"

Desired output:

AppName,UserID,UserRole,LoginDateTime
App1,googleservice,Power User,"Jul 10, 2022 16:02:41 UTC"
App1,googleservice,User,"August 10, 2022 16:02:41 UTC"
App1,pretty.zinta@google.com,Service Administrator,"August 2, 2022 07:07:47 UTC"
App1,pretty.zinta@google.com,Viewer,"July 14, 2022 10:01:20 UTC"

I have attempted partial commands and in-complete due to Date and Time format is in UTC format. How to achieve the desired output?

James Z
  • 12,209
  • 10
  • 24
  • 44
iamhaziq
  • 1
  • 1
  • 4
    `Have attempted partial commands`: Please edit your question show it in your question – anubhava Aug 29 '22 at 16:46
  • 1
    Also tell us the output of `awk --version` and whether or not you can install GNU awk if you don't already have it. – Ed Morton Aug 29 '22 at 17:29
  • This would be much easier if the date string has a "standardized" format -- either full month name or abbreviated month name but not mixed. – glenn jackman Aug 29 '22 at 17:40
  • It should be possible to solve if you [convert the month name to a number](https://stackoverflow.com/questions/10109974/convert-month-from-aaa-to-xx-in-little-script-with-awk), and rearrange the positions in the dates to form a standard format. But with only 5000 rows, you would almost be done before using a spreadsheet, or even manually. But solve it as a challenge then. – MyICQ Aug 29 '22 at 18:16
  • Take a look at [Schwartzian transform](https://en.wikipedia.org/wiki/Schwartzian_transform) or use `awk`. – Cyrus Aug 29 '22 at 18:31

1 Answers1

2

Using any awk:

$ cat tst.awk
BEGIN { FS="," }
NR == 1 { print; next }
match($0,/"[^"]+"$/) {
    split(substr($0,RSTART+1,RLENGTH-2),t,/[ ,]+/)
    abbr = substr(t[1],1,3)
    mthNr = (index("JanFebMarAprMayJunJulAugSepOctNovDec",abbr)+2)/3
    ts = sprintf("%04d/%02d/%02d %8s", t[3], mthNr, t[2], t[4])
    key = $2 FS $3
    if ( ts > maxTs[key] ) {
        maxTs[key] = ts
        vals[key] = $0
    }
}
END {
    for ( key in vals ) {
        print vals[key]
    }
}

$ awk -f tst.awk Input.csv
AppName,UserID,UserRole,LoginDateTime
App1,googleservice,User,"August 10, 2022 16:02:41 UTC"
App1,pretty.zinta@google.com,Viewer,"July 14, 2022 10:01:20 UTC"
App1,pretty.zinta@google.com,Service Administrator,"August 2, 2022 07:07:47 UTC"
App1,googleservice,Power User,"Jul 10, 2022 16:02:41 UTC"
Ed Morton
  • 188,023
  • 17
  • 78
  • 185