1

I want to add quotes for each field if the field values is not enclosed with quotes.

Scenario 1 : Below is my input

eno;ename;address;gender
1;AaA;"mnop qrstzyx";M
2;bB;"abc";F
3;cC;"adef;mnp";ars
4;mnp;ade;M

Expected output

"eno";"ename";"address";"gender"
"1";"AaA";"mnop qrstzyx";"M"
"2";"bB";"abc";"F"
"3";"cC";"adef;mnp";"ars"
"4";"mnp";"ade";"M"

Options which i tried

sed 's/[^;]*/"&"/g'

Scenario 2 : Below is my input

eno;ename;address;gender
1;AaA;"mnop 
qrst
zyx";M
2;bB;"abc";F
3;cC;"adef;
mnp";ars
4;mnp;ade;M

Expected output

"eno";"ename";"address";"gender"
"1";"AaA";"mnop qrstzyx";"M"
"2";"bB";"abc";"F"
"3";"cC";"adef;mnp";"ars"
"4";"mnp";"ade";"M"
user1485267
  • 1,295
  • 2
  • 10
  • 19

4 Answers4

3

Use FPAT with gnu awk to handle CSV files

awk -v FPAT='([^;]*)|("[^"]+")' -v OFS=";" '{for (i=1;i<=NF;i++) {if ($i!~/^\"/) $i="\""$i"\""}} 1' file
"eno";"ename";"address";"gender"
"1";"AaA";"mnop qrstzyx";"M"
"2";"bB";"abc";"F"
"3";"cC";"adef;mnp";"ars"
"4";"mnp";"ade";"M"
Jotne
  • 40,548
  • 12
  • 51
  • 55
2

Perl one liner using the useful Text::AutoCSV module:

perl -MText::AutoCSV -e 'Text::AutoCSV->new(in_file=>"input.txt",
    out_file=>"output.txt",
    sep_char=>";",
    out_always_quote=>1)->write()'

And to squash newlines in fields:

perl -MText::AutoCSV -e 'Text::AutoCSV->new(in_file=>"input.txt",
    out_file=>"output.txt",
    sep_char=>";",
    out_always_quote=>1,
    read_post_update_hr=>sub{
        while(my ($k,$v) = each %{$_[0]}){
            ${$_[0]}{$k}=$v=~s/\n//gr
        } })->write()'

will output

"eno";"ename";"address";"gender"
"1";"AaA";"mnop qrstzyx";"M"
"2";"bB";"abc";"F"
"3";"cC";"adef;mnp";"ars"
"4";"mnp";"ade";"M"

If you have non-trivial CSV-like data like yours, it's always better to use an actual CSV parser instead of trying to hack around with regular expressions trying to get something to work.

Shawn
  • 47,241
  • 3
  • 26
  • 60
  • Is there any way outfile can be placed directly to hadoop file system instead of local file system without landing to intermediate area by using your command – user1485267 Sep 12 '19 at 08:21
  • @user1485267 Not a clue; I am completely ignorant of hadoop. – Shawn Sep 12 '19 at 08:36
  • Ok, No issues. Can we keep the output of data in memory and process output with redirection. for example perl 'command' > output.txt – user1485267 Sep 12 '19 at 09:03
2

This might work for you (GNU sed):

sed -E ':a;s/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\1\n/;ta;s/"//g;s/[^;]*/"&"/g;y/\n/;/' file

Change the ;'s within double quoted strings to something else (\n). Remove double quotes from all fields. Surround all characters which are not delimiters (;) by double quotes. Finally replace newlines by ;'s.

The second example needs newlines removed from within double quoted strings. A separate solution for this is:

sed -Ez ':a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file

Or, for older seds:

sed -r '1h;1!H;$!d;x;:a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file

The overall solution is a combination of the above, where the file is first treated by the second solution which is piped into the first solution e.g.

sed -Ez ':a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file |
sed -E ':a;s/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\1\n/;ta;s/"//g;s/[^;]*/"&"/g;y/\n/;/'
potong
  • 55,640
  • 6
  • 51
  • 83
  • it didnt work with the final solution.. it is adding quotes without appending to previous line.. if my filename is input.csv – user1485267 Sep 10 '19 at 05:58
  • @user1485267 I do not understand your comment. For both scenario 1 and 2 I get the same result. – potong Sep 10 '19 at 11:46
  • when i run sed -Ez ':a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' input.csv | sed -E ':a;s/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\1\n/;ta;s/"//g;s/[^;]*/"&"/g;y/\n/;/' like this i am getting issues syntax wrong – user1485267 Sep 10 '19 at 12:07
  • Thank u potong.. is there any way which i can run first command and save the result in one file and from that file i want to do second sed after pipe – user1485267 Sep 10 '19 at 16:35
  • 1
    @user1485267 you can use a tee command e.g. `command1 inputFile | tee saveFile1 | command2 >saveFile2` – potong Sep 10 '19 at 21:48
  • It is only because of -Ez script causing me the issue when i ran your command with -Ez it gives me invalid option --z. Can you suggest me what will be the other way to achive it without 'z' parameter second command it doesn't work – user1485267 Sep 11 '19 at 14:00
  • I tried like below way by removing z in your command it didn't worked.sed -E ':a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file | sed -E ':a;s/^([^"]*("[^";]*"[^"]*)*"[^";]*);/\1\n/;ta;s/"//g;s/[^;]*/"&"/g;y/\n/;/' – user1485267 Sep 11 '19 at 14:01
  • My sed version is GNU Sed 4.2.1 – user1485267 Sep 11 '19 at 14:03
  • It throws an issue like below for the older sed's sed: -e expression #1, char 58: invalid reference \1 on `s' command's RHS. Command : sed '1h;1!H;$!d;x;:a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file – user1485267 Sep 12 '19 at 05:25
  • Thank you it worked fine with adding extra -r in your last modified command i.e sed -r '1h;1!H;$!d;x;:a;s/^([^"]*("[^"\n]*"[^"]*)*"[^"\n]*)\n/\1/;ta' file – user1485267 Sep 12 '19 at 05:38
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/199348/discussion-between-user1485267-and-potong). – user1485267 Sep 12 '19 at 05:52
  • when the filesize is big the command gets hanged around 150MB file I am trying to replace it. – user1485267 Sep 12 '19 at 14:25
  • Are there any suggestions which can be provided – user1485267 Sep 12 '19 at 14:25
0

I suggest a simple awk (standard Linux gawk) script that update each line and print it

awk '{
     gsub(/;/, "\";\""); # input line substitute all ; with ";"
     print "\"" $0 "\""; # print input line wrapped with "
}' input.txt

or one liner

awk '{gsub(/;/, "\";\"");print "\"" $0 "\"";}' input.txt
Dudi Boy
  • 4,551
  • 1
  • 15
  • 30