0

Input file data:

"1","123","hh
KKK,111,ll
Jk"
"2","124","jj"

Output data:

"1","123","hh KKK,111,ll jk"
"2","124","jj"

Tried below code in awk file. still not working for desired output:

BEGIN{
      `FS="\",\"";
        record_lock_flag=0;
        total_feilds=3;
        tmp_field_count=0;
        tmp_rec_buff="";
        lines=0;
        }
        {
        if(NR>0)
        {
        if( record_lock_flag == 0 && NF == total_feilds && substr($NF,length($NF)-1,length($NF)) ~ /^"/  )
                 {
        print $0;
                }
        else
                {
        tmp_rec_buff=tmp_rec_buff$0 ;
        tmp_field_count=tmp_field_count+NF ;
        if ( $0 != "")
        { lines++ ;}
        rec_lock_flag=1 ;
                 if(tmp_field_count==exp_fields+lines-1){
                                print tmp_rec_buff;
                                record_lock_flag=0;
                                tmp_field_count=0;
                                tmp_rec_buff="";
                                lines=0;
                                                        }
                }
        }
        }
        END{
        }`
Cyrus
  • 84,225
  • 14
  • 89
  • 153
nagothu
  • 39
  • 2

4 Answers4

3

Using any awk in any shell on every Unix box:

$ awk 'BEGIN{RS=ORS="\""} !(NR%2){gsub(/\n/," ")} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"

See also What's the most robust way to efficiently parse CSV using awk?.

Ed Morton
  • 188,023
  • 17
  • 78
  • 185
2

Using gnu-awk we can break records using text "\n" then remove \n from each record and finally append "\n" in the end using same ORS (assuming there are no blank fields with opening and closing quotes on separate lines):

awk -v RS='"\n("|$)' '{gsub(/\n/, " "); ORS=RT} 1' file

"1","123","hh KKK,111,ll Jk"
"2","124","jj"

Another version using gnu-awk if you already know number of fields in each record as shown in your question:

awk -v n=3 -v FPAT='"[^"]*"' 'p {$0 = p " " $0; p=""}
NF < n {p = $0; next} 1' file

"1","123","hh KKK,111,ll Jk"
"2","124","jj"
anubhava
  • 761,203
  • 64
  • 569
  • 643
1

With your shown samples only, you could try following awk code. Written and tested with GNU awk.

awk -v RS="" -v FS="\n" '
{
  for(i=1;i<=NF;i++){
    sum+=gsub(/"/,"&",$i)
    val=(val?val OFS:"")$i
    if(sum%2==0){
      print val
      sum=0
      val=""
    }
  }
}
' Input_file

Explanation: Adding detailed explanation for above.

awk -v RS="" -v FS="\n" '    ##Starting awk program from here, setting RS as NULL and field separator as new line.
{
  for(i=1;i<=NF;i++){        ##Traversing through all fields here.
    sum+=gsub(/"/,"&",$i)    ##Globally substituting " with itself and keeping its count to sum variable.
    val=(val?val OFS:"")$i   ##Creating val which has current field in it and keep appending its value to it.
    if(sum%2==0){            ##Checking if sum is even number then do following.
      print val              ##Printing val here.
      sum=0                  ##Setting sum to 0 here.
      val=""                 ##Nullifying val here.
    }
  }
}
' Input_file                 ##Mentioning Input_file name here.
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
0

With awk setting ORS:

awk '{ORS = (!/"$/) ? " " : "\n"} 1' file
"1","123","hh KKK,111,ll Jk"
"2","124","jj"
Carlos Pascual
  • 1,106
  • 1
  • 5
  • 8