awk ignore field delimiter inside quote in certain column and move it afterwards

Question

Chaps!

I have following CSV file (exported from app):

"Task Id",Reference,Project,Status,Category,Swimlane,Column,Position,Color,"Due date",Creator,"Creator Name","Assignee Username","Assignee Name",Complexity,Title,"Creation date","Modification date","Completion date","Start date","Time estimated","Time spent",Priority,Tags

703,,"TEAM1.Project"1,Closed,"TEAM1 Tasks1","Swimlane T1",Done,9,Yellow,12/21/2019 9:17,user97,"Joe Doe",user97,"Joe Doe",0,"Provisioning nodes ""solaris001, solaris002""            ","12/19/2019 9:17","12/22/2019 6:10","12/22/2019 6:10","12/19/2019 9:19",0,0,0,

49,47114,"TEAM1.Project2",Closed,"TEAM1 Tasks2","Swimlane T1",Done,1,Yellow,10/4/2019 13:47,user49,"Peter Deer",user49,"Peter Deer",0,"Reset user1, user2 passwords","4/9/2019 14:21","10/6/2019 7:14","10/6/2019 7:14","9/4/2019 0:00",0,0,0,

700,,"TEAM1.Project3",Closed,"TEAM1 Tasks3","Swimlane T1",Done,2,Blue,12/20/2019 9:52,user98,"Donald Plumb",user98,"Donald Plumb",0,"Reviewing logs","12/18/2019 9:52","12/18/2019 9:52","12/18/2019 9:52","12/17/2019 0:00",0,0,0,

Need to move column "Creation date" (17th) to the beginning of the file, but in some cases it is a mess because of column "Title" which can contain field delimiters or quotation characters. In below example only last line looks good (except from the header):

awk -F',' -v OFS=',' '{k=$17; $17=""; print k,$0}' FILE
"Creation date","Task Id",Reference,Project,Status,Category,Swimlane,Column,Position,Color,"Due date",Creator,"Creator Name","Assignee Username","Assignee Name",Complexity,Title,,"Modification date","Completion date","Start date","Time estimated","Time spent",Priority,Tags

 solaris002""            ",703,,"TEAM1.Project"1,Closed,"TEAM1 Tasks1","Swimlane T1",Done,9,Yellow,12/21/2019 9:17,user97,"Joe Doe",user97,"Joe Doe",0,"Provisioning nodes ""solaris001,,"12/19/2019 9:17","12/22/2019 6:10","12/22/2019 6:10","12/19/2019 9:19",0,0,0,

 user2 passwords",49,47114,"TEAM1.Project2",Closed,"TEAM1 Tasks2","Swimlane T1",Done,1,Yellow,10/4/2019 13:47,user49,"Peter Deer",user49,"Peter Deer",0,"Reset user1,,"4/9/2019 14:21","10/6/2019 7:14","10/6/2019 7:14","9/4/2019 0:00",0,0,0,

"12/18/2019 9:52",700,,"TEAM1.Project3",Closed,"TEAM1 Tasks3","Swimlane T1",Done,2,Blue,12/20/2019 9:52,user98,"Donald Plumb",user98,"Donald Plumb",0,"Reviewing logs",,"12/18/2019 9:52","12/18/2019 9:52","12/17/2019 0:00",0,0,0,

I tried to play with FPAT option but failed miserably. Help!

EDIT:

Almost there! Now everything after comma from "Title" goes to "Modification date":

Creation date   Task Id Reference   Project Status  Category    Swimlane    Column  Position    Color   Due date    Creator Creator Name    Assignee Username   Assignee Name   Complexity  Title   Modification date   Completion date Start date  Time estimated  Time spent  Priority    Tags

12/19/2019 9:17 703     TEAM1.Project"1 Closed  TEAM1 Tasks1    Swimlane T1 Done    9   Yellow  12/21/2019 9:17 user97  Joe Doe user97  Joe Doe 0   Provisioning nodes "solaris001   solaris002"                12/22/2019 6:10 12/22/2019 6:10 12/19/2019 9:19 0   0   0

4/9/2019 14:21  49  47114   TEAM1.Project2  Closed  TEAM1 Tasks2    Swimlane T1 Done    1   Yellow  10/4/2019 13:47 user49  Peter Deer  user49  Peter Deer  0   Reset user1  user2 passwords    10/6/2019 7:14  10/6/2019 7:14  9/4/2019 0:00   0   0   0

12/18/2019 9:52 700     TEAM1.Project3  Closed  TEAM1 Tasks3    Swimlane T1 Done    2   Blue    12/20/2019 9:52 user98  Donald Plumb    user98  Donald Plumb    0   Reviewing logs  12/18/2019 9:52 12/18/2019 9:52 12/17/2019 0:00 0   0   0

Screenshot of above listing

The idea that you can use a regex for this is flawed or at least problematic. To work in all corner cases, you have to deal with escaped quotes (in many CSV dialects, this is done by doubling the quotes) and quoted newlines. There is no facility for nested structures, so it's not theoretically impossible; but Awk is not particularly suitable for this. — tripleee, Dec 22 '19 at 10:14
Thanks tripleee. What you recommend to use in this case then? — Ken Ryou, Dec 23 '19 at 05:23
If you need a flexible solution, I'd go with Python; but any modern language will do. If you just need a robust tool, there are several dedicated CSV parsers. See also https://stackoverflow.com/questions/1560393/bash-shell-scripting-csv-parsing - I'd add that as a second duplicate actually. — tripleee, Dec 23 '19 at 09:47

RavinderSingh13 · Accepted Answer · 2019-12-23T07:01:41.833

EDIT: Taken complete inspiration from Ed sir's this link(What's the most robust way to efficiently parse CSV using awk?), additionally adding 1st line logic to get that field name in this code.

awk -v FPAT='([^,]*)|("[^"]+")' '
BEGIN{
  OFS=","
}
FNR==1{
  match($0,/"Creation date"/)
  print substr($0,RSTART,RLENGTH),substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH+1)
  next
}
function buildRec(      i,orig,fpat,done) {
    $0 = PrevSeg $0
    if ( gsub(/"/,"&") % 2 ) {
        PrevSeg = $0 RS
        done = 0
    }
    else {
        PrevSeg = ""
        gsub(/@/,"@A"); gsub(/""/,"@B")            # <"x@foo""bar"> -> <"x@Afoo@Bbar">
        orig = $0; $0 = ""                         # Save $0 and empty it
        fpat = "([^" FS "]*)|(\"[^\"]+\")"         # Mimic GNU awk FPAT meaning
        while ( (orig!="") && match(orig,fpat) ) { # Find the next string matching fpat
            $(++i) = substr(orig,RSTART,RLENGTH)   # Create a field in new $0
            gsub(/@B/,"\"",$i); gsub(/@A/,"@",$i)  # <"x@Afoo@Bbar"> -> <"x@foo"bar">
            gsub(/^"|"$/,"",$i)                    # <"x@foo"bar">   -> <x@foo"bar>
            orig = substr(orig,RSTART+RLENGTH+1)   # Move past fpat+sep in orig $0
        }
        done = 1
    }
    return done
}

BEGIN { FS=OFS="," }
!buildRec() { next }
{
  $1=$17 OFS $1
  $17="@@"
  sub(/,@@,/,",")
  print
}'  Input_file

Could you please try following, fair warning since expected output was not given so OP could only confirm if this is correct output(though I tested it by field count from header).

awk -v FPAT='([^,]*)|("[^"]+")' '
BEGIN{
  OFS=","
}
FNR==1{
  match($0,/"Creation date"/)
  print substr($0,RSTART,RLENGTH),substr($0,1,RSTART-1) substr($0,RSTART+RLENGTH+1)
  next
}
{
  $1=$17 OFS $1
  $17="@@"
  sub(/,@@,/,"")
}
1
' Input_file

NOTE: Also considering 1 more thing that your Input_file doesn't have @@ in it. Since I am using it to find 20th column to substitute it later, in case you have this then try to put $20="@@" as per your convenience of string which is NOT present in your Input_file

NOTE2: To see field numbers with field values you could run following.

awk -v FPAT='([^,]*)|("[^"]+")' 'FNR==1{for(i=1;i<=NF;i++){print i,$i}}'  Input_file
1 "Task Id"
2 Reference
3 Project
4 Status
5 Category
6 Swimlane
7 Column
8 Position
9 Color
10 "Due date"
11 Creator
12 "Creator Name"
13 "Assignee Username"
14 "Assignee Name"
15 Complexity
16 Title
17 "Creation date"
18 "Modification date"
19 "Completion date"
20 "Start date"
21 "Time estimated"
22 "Time spent"
23 Priority
24 Tags

See fields for 2nd line:

1 703
2
3 "TEAM1.Project"1
4 Closed
5 "TEAM1 Tasks1"
6 "Swimlane T1"
7 Done
8 9
9 Yellow
10 12/21/2019 9:17
11 user97
12 "Joe Doe"
13 user97
14 "Joe Doe"
15 0
16 "Provisioning nodes ""solaris001
17  solaris002""            "
18 "12/19/2019 9:17"
19 "12/22/2019 6:10"
20 "12/22/2019 6:10"
21 "12/19/2019 9:19"
22 0
23 0
24 0
25

Ravi, you're awk ninja! Thank you! Almost there, please see my update in EDIT. — Ken Ryou, Dec 23 '19 at 07:17
@KenRyou, Thanks for complement but I am no where to Ed sir honestly :) whats the problem now, please explain me more here. — RavinderSingh13, Dec 23 '19 at 07:26
@KenRyou, I am sorry could you please explain it here more, I am willing to help here. — RavinderSingh13, Dec 23 '19 at 08:43
Sure! My screenshot is showing listing after running your latest snippet - it's easy to observe there what I am talking about here, in short: "Creation date" column is moved correctly now, but everything after comma in "Title" field are split and moved to "Modification date" column. — Ken Ryou, Dec 23 '19 at 09:16

awk ignore field delimiter inside quote in certain column and move it afterwards

1 Answers1