0

I'm new to AWK, need help on below. I have below code to print 9th column value in CSV when false. The 9th column having 7 lines but its printing only the first line. Can someone tell me how to print complete 9th column value?

It is printin only "Test failed: text expected to equal /

FILES=$*
for f in $FILES
do
  echo "${f##*/}"
  echo "------------------------------------------------"
  awk -F "," 'BEGIN{print $f} $8 == "false" {print $9}' $f
  echo
done

My input CSV:

timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,failureMessage,bytes,sentBytes,grpThreads,allThreads,Latency,IdleTime,Conne$
1583830716746,1202,HTTP Request- Authorization TC01,200,OK,ZH 1-1,text,true,,530,354,1,1,1202,0,1124
1583830717967,59,ID_001_Wrong_PNR,500,Internal Error,ZH 1-1,text,false,"Test failed: text expected to equal /

****** received  : [[[
                {
                    ""status"": ""500"",
                    ""code"": ""500"",
        ...]]]

****** comparison: [[[{""seatReservations"":[{""passengerKey"":""PAX1"",""success"":""false"",""seatCode"":""50C"",""segmentKey"":""SEG1"",""...]]]

/",322,1023,1,1,58,0,0

output getting:

"Test failed: text expected to equal /

Expected output:

"Test failed: text expected to equal /

****** received  : [[[
                {
                    ""status"": ""500"",
                    ""code"": ""500"",
        ...]]]

****** comparison: [[[{""seatReservations"":[{""passengerKey"":""PAX1"",""success"":""false"",""seatCode"":""50C"",""segmentKey"":""SEG1"",""...]]]

/"
kotlo ram
  • 23
  • 1
  • 1
  • 5
  • awk parses by default based on newline character, but you are expecting awk to somehow understand a field spread over multiple lines.. you can check out https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk ... or you could use python and see if its csv module helps – Sundeep May 14 '20 at 10:54
  • Sundeep - Thank you. But my requirement in Shell not in Python. I tried in python which is printing complete value. – Ram Krishna May 14 '20 at 11:11
  • When posting CSVs with fields that contain newlines you should show us the output of `cat -ev file.csv` so we can see which newlines are LFs alone and which, if any, are CRLFs because if you have a mix of both (LFs inside quoted fields and CRLFs at the end of records) and access to GNU awk then the solution becomes much simpler than otherwise. – Ed Morton May 14 '20 at 12:10
  • I suggested Python because you have Python as a tag, for cli, see https://github.com/dbohdan/structured-text-tools – Sundeep May 14 '20 at 12:10
  • 2
    [@RamKrishna](https://stackoverflow.com/users/8878135/ram-krishna) are you the same person who posted this question, [kotloram](https://stackoverflow.com/users/13260423/kotlo-ram), but using 2 different accounts? – Ed Morton May 14 '20 at 12:19
  • 1
    Interesting: https://stackoverflow.com/questions/61105737/sonarqube-quality-gate-status-check-fail-in-jenkins-pipeline – James Brown May 14 '20 at 13:41

1 Answers1

0

As awk is not csv-savvy, you can not use awk for that without writing some kind of csv parser for it. There are some in the internets. You could also whip up some kind of a hack that works for your particular problem. Like this (well, these as I was too lazy to combine them) for GNU awk (for using FPAT):

$ gawk '
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")+"  # using FPAT instead of FS, look it up.
    OFS=","
}                                   # if record has 16 fields (this is uncertain
NF==16 {                            # define the condition better to suit data)
    $0=$0 "\r\n"                    # use different newline
}1' file | gawk '                    # pipe this to another awk
BEGIN {
    FPAT = "([^,]*)|(\"[^\"]+\")+"
    RS="\r\n"                       # that uses \r\n as RS
}
$8=="false" {
    print $9
}'

Output:

"Test failed: text expected to equal /

****** received  : [[[
                {
                    ""status"": ""500"",
                    ""code"": ""500"",
        ...]]]

****** comparison: [[[{""seatReservations"":[{""passengerKey"":""PAX1"",""success"":""false"",""seatCode"":""50C"",""segmentKey"":""SEG1"",""...]]]

/"

The first awk expects the data to have \n record separators and for "whole" records it changes the newline to \r\n where as "the

malformed

data" has \n. The second awk then uses \r\n to separate records. The condition for detecting "good" and "bad" records is not adequate and needs better definition, that's a sample only and probably messes up the next record.

It's a hack, treat it as one. HACK THE PLANET!

James Brown
  • 36,089
  • 7
  • 43
  • 59