0

I have a file with the following records

ABC
BCD
CDE
EFG

I would like to convert this into

'ABC','BCD','CDE','EFG'

I attempted to attack this problem using Awk in the following way:

awk '/START/{if (x)print x;x="";next}{x=(!x)?$0:x","$0;}END{print x;}'

but I obtain not what I expected:

ABC,BCD,CDE,EFG

Are there any suggestions on how we can achieve this?

kvantour
  • 25,269
  • 4
  • 47
  • 72
Vikakmis
  • 13
  • 2

4 Answers4

2

Could you please try following.

awk -v s1="'" 'BEGIN{OFS=","} {val=val?val OFS s1 $0 s1:s1 $0 s1} END{print val}' Input_file

Output will be as follows.

'ABC','BCD','CDE','EFG'
RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    `val=val?val OFS s1 $0 s1:s1 $0 s1` = `val=(val ? val OFS : "") s1 $0 s1`. Improved clarity and less redundancy and improved robustness across all awks since it parenthesizes the ternary expression. – Ed Morton Sep 06 '18 at 15:31
  • @Vikakmis, you could select any of the answer as correct answer too, to close the loop properly. – RavinderSingh13 Sep 07 '18 at 05:32
2

With GNU awk for multi-char RS:

$ awk -v RS='\n$' -F'\n' -v OFS="','" -v q="'" '{$1=$1; print q $0 q}' file
'ABC','BCD','CDE','EFG'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
1

There are many ways of achieving this:

with pipes:

sed "s/.*/'&'/" <file> | paste -sd,
awk '{print '"'"'$0'"'"'}' <file> | paste -sd,

remark: we do not make use of tr here as this would lead to an extra , at the end.

reading the full file into memory:

sed ':a;N;$!ba;s/\n/'"','"'/g;s/.*/'"'&'"'/g' <file>  #POSIX
sed -z 's/^\|\n$/'"'"'/g;s/\n/'"','"'/g;' <file>      #GNU

and the solution of @EdMorton

without reading the full file into memory:

awk '{printf (NR>1?",":"")"\047"$0"\047"}' <file>

and some random other attempts:

awk '(NR-1){s=s","}{s=s"\047"$0"\047"}END{print s}' <file>
awk 'BEGIN{printf s="\047";ORS=s","s}(NR>1){print t}{t=$0}END{ORS=s;print t} <file>

So what is going on with the OP's attempts?

Writing down the OP's awk line, we have

/START/{if (x)print x;x="";next}
{x=(!x)?$0:x","$0;}
END{print x;}

What does this do? Let us analyze step by step:

  • /START/{if (x)print x;x="";next}:: This reads If the current record/line contains the string START, then do

    • if (x) print x:: if x is not an empty string, print the value of x
    • x="" set x to be an empty string
    • next:: skip to the next record/line

    In this code block, the OP probably assumed that /START/ means do this at the beginning of all things. In awk, this is however written as BEGIN and since in the beginning, all variables are empty strings or zero, the if statement is not executed by default. This block could be replaced by:

    BEGIN{x=""}
    

    But again, this is not needed and thus one can remove it:

  • {x=(!x)?$0:x","$0;}:: concatenate the string with the correct delimiter. This is good, especially due to the usage of the ternary operator. Sadly the delimiter is set to , and not ',' which in awk is best written as \047,\047. So the line could read:

    {x=(!x)?$0:x"\047,\047"$0;}
    

    This line, can be written shorter if you realize that x could be an empty string. For an empty string, x=$0 is equivalent to x=x $0 and all you want to do is add a separator which all or not could be an empty string. So you can write this as

    {x= x ((!x)?"":"\047,\047") $0}
    

    or inverting the logic to get rid of some more characters:

    {x=x(x?"\047,\047":"")$0}
    

    one could even write

    {x=x(x?"\047,\047":x)$0}
    

    but this is not optimal as it needs to read what is the memory of x again. However, this form can be used to finally optimize it to (per @EdMorton's comment)

    {x=(x?x"\047,\047":"")$0}
    

    This is better as it removes an extra concatenation operator.

  • END{print x}:: Here the OP prints the result. This, however, will miss the final single-quotes at the beginning and end of the string, so they could be added

    END{print "\047" x "\047"}
    

So the corrected version of the OP's code would read:

awk '{x=(x?x"\047,\047":"")$0}END{print "\047" x "\047"}'
kvantour
  • 25,269
  • 4
  • 47
  • 72
0

awk may be better

awk '{printf fmt,$1}' fmt="'%s'\n" file | paste -sd, -

'ABC','BCD','CDE','EFG'
justaguy
  • 2,908
  • 4
  • 17
  • 36