Need your help !!!
I have an input file 1.txt as follows:
"row1@abc.com,venki@abc.com,vr@xyz.com",field2,field3,field4
row2r@xyz.com,field2,field3,field4
field1,"row3@abc.com,venki@abc.com, vr@xyz.com",field3,field4
field1,"1111@abc.com,row4@abc.com",,field4
field1,"1111@abc.com""1234,999,row5@abc.com",,field4
I want to identify rows that do not have exactly four(4) columns and remove them from the file before loading to a PostGres Table using the copy command.
I am using awk as follows:
awk -v FPAT='([^,]+)||("[^"]+")||(^[ ]*$)' '{printf("Row:%d,NF=%d, %s, %s, %s, %s\n",NR, NF, $1,$2,$3,$4)}' 1.txt
And the output is;
Row:1,NF=4, "row1@abc.com,venki@abc.com, vr@xyz.com", field2, field3, field4
Row:2,NF=4, row2r@xyz.com, field2, field3, field4
Row:3,NF=4, field1, "row3@abc.com,venki@abc.com, vr@xyz.com", field3, field4
Row:4,NF=4, field1, "1111@abc.com,row4@abc.com", , field4
Row:5,NF=6, field1, "1111@abc.com""1234, 999, row5@abc.com"
You can see the problem with the last row. It has both double quotes and two commas in the value and that is screwing things up. The NF for that row is 6 when it should be 4.
Can anyone suggest the regex to identify the value "1111@abc.com""1234,999,row5@abc.com"
Thanks. Venki