0

I am looking for below input based on the sample provided below

Sample

eno~ename~address~zip
123~abc~~560000~"a~b~c"
245~"abc ~ def"~hyd~560102
333~"ghi~jkl"~pub~560103
444~ramdev "abc def"~ram~10000

Expected Output

"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"~"a~b~c"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"
"444"~"ramdev ""abc def"""~"ram"~"10000"

Current Code :

awk 'BEGIN{s1="\"";FS=OFS="~"} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' sample

Current code doesn't work for last line.. This is enhancement of insert quotes for each field using awk

joanis
  • 10,635
  • 14
  • 30
  • 40
user1485267
  • 1,295
  • 2
  • 10
  • 19
  • Your question is not very clear, but if I understand correctly, what you want is quotes around each field, with existing quotes that are in the middle of the field doubled up, is that right? – joanis Aug 26 '19 at 13:44
  • Your last line makes a little less sense. On what basis is "ramdev" getting quoted? If i consider a single space is also a separator like tilde(~), even then why `"abc def"` got an additional quote? – Mihir Luthra Aug 26 '19 at 13:58
  • See my previous comment - https://stackoverflow.com/questions/57655449/insert-quotes-for-each-field-using-awk#comment101765127_57655837. If you can't get the solution posted at [whats-the-most-robust-way-to-efficiently-parse-csv-using-awk](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk) to work for you then ask a question with THAT script as a starting point, not some other script that can't work and can't be enhanced to work as in your question. – Ed Morton Aug 26 '19 at 14:05
  • 2
    What you are trying to do is a simple task with the Text::CSV Perl module. http://metacpan.org/pod/Text::CSV – lordadmira Aug 26 '19 at 14:09
  • 1
    In your previous question, after you got some answers then you modified your input to include a case where a field contained a newline and so invalidated the answers you had already received. If you do need to ask a new question, then make sure your sample input/output includes fields with newlines (and any other non-trivial cases) if they can occur in your real data. – Ed Morton Aug 26 '19 at 14:13
  • Fourth row contains data with quotes without delimiter in that case i want the expected output which i mentioned above.. if quotes are present in the data without delimiter value it has to be enclosed with the quotes. – user1485267 Aug 26 '19 at 17:19
  • If there is any possibilities with one liner of code to reduce the complexity it makes easier to understand us – user1485267 Aug 26 '19 at 17:26

1 Answers1

2

This might work for you (GNU sed):

cat <<\! | sed -Ef - file
:a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta; #1
s/.*/~&/                                                             #2
s/~"([^"]*)"/~\1/g                                                   #3
s/"/""/g                                                             #4
s/.//                                                                #5
s/[^~]*/"&"/g                                                        #6
y/\n/~/;                                                             #7
!

This sed script works as follows:

  1. ~ within strings can be confused with field delimiters. They need to replaced by a unique character which is not present in the current line. As sed uses newlines to delimit its input, a newline cannot be presented in the pattern space and is therefore the perfect choice for such a character. Fields consist of three types of strings:

    a) Strings which not start and end with double quotes and have no quoted strings.

    b) Double quoted strings

    c) Strings which not start and end with double quotes and have quoted strings within them.

    The latter strings need any ~'s within them to be substituted for \n's. This can be achieved by looping through the current line leaving fields of type a,b or c that do not contain ~'s and only replacing ~'s in the latter strings.

  2. To make it easier for the next step, we introduce a field delimiter for the first string.

  3. Remove all double quotes enclosing fields (see 1b).

  4. All double quotes remaining are within strings of type 1c and can be quoted by prefixing a ".

  5. Now remove the initial field delimiter introduced in step 2.

  6. Surround all fields by double quotes.

  7. Replace newlines introduced in step 1 by their original value i.e. ~.

N.B. It appears that GNU sed has a bug whereby if the translate command (y/../../) is the last command within a script or a one line command, it needs to suffixed by a ;.

The above solution can be entered on one long line:

sed -E ':a;s/^([^"~][^~]*~+("[^~"]*"~+[^"~][^~]*~+)*[^"]*"[^"~]*)~/\1\n/;ta;s/.*/~&/;s/~"([^"]*)"/~\1/g;s/"/""/g;s/.//;s/[^~]*/"&"/g;y/\n/~/;' file
potong
  • 55,640
  • 6
  • 51
  • 83
  • I am not able to execute it with the above command. I get syntax error is there any sed editor online which you have tried it – user1485267 Aug 28 '19 at 11:25
  • It says cat --invalid option --f – user1485267 Aug 28 '19 at 11:34
  • @user1485267 the example above uses the bash shell and pipes the output of here-document to the sed command which accepts the sed commands as a file from stdin via the `-f -` option. Another way to do this is to put the sed commands in a file e.g. `sedFile` and then call them using the following command: `sed -Ef sedFile file`. To make the `sedFile` copy lines beginning `:a` to one beginning `y/\n/ /;`. HTH – potong Aug 28 '19 at 12:14
  • It works as expected if i save it in file and run the command what you suggested. If i want to run in a single command instead of saving commands in file can you help by above command it fails. – user1485267 Aug 28 '19 at 12:43
  • In above command where to pass input file and store in different output file.. I tried it in after cat and at end ! sample.txt it doesn't work – user1485267 Aug 28 '19 at 13:02
  • I feel this command doesn't work if we have new lines in data you are splitting data to new lines due to that i hope it doesn't work. – user1485267 Aug 29 '19 at 10:19
  • @user1485267 Your question states *"I am looking for below input based on the sample provided below"*. There are no newlines in the example other than at the end of a record/line. – potong Aug 29 '19 at 12:14
  • Yes agreed .. In future if that kind of newline pattern or carriage return pattern comes i am thinking what can be done. – user1485267 Aug 29 '19 at 12:50