1

I am looking for below input based on the sample provided below

Sample :

eno~ename~address~zip
123~abc~~560000~"a~b~c"
245~"abc ~ def"~hyd~560102
333~"ghi~jkl"~pub~560103

Expected output :

"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"~"a~b~c"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"

command which i tried in awk it doesn't work if the delimiter value contains in data. If there are any alternate suggestions with perl/sed/awk suggest.

Below is the command : awk '{for (i=1;i<=NF;i++) $i="\""$i"\""}1' FS="~" OFS="~" sample

Jotne
  • 40,548
  • 12
  • 51
  • 55
user1485267
  • 1,295
  • 2
  • 10
  • 19
  • 1
    Possible duplicate of [What's the most robust way to efficiently parse CSV using awk?](https://stackoverflow.com/questions/45420535/whats-the-most-robust-way-to-efficiently-parse-csv-using-awk). The only difference is the field separator – oguz ismail Aug 26 '19 at 10:00
  • 1
    This is not really a duplicate but closely related. – kvantour Aug 26 '19 at 10:14
  • 2
    If you have a different input, make a new question. Make sure you google before post as well. – Jotne Aug 26 '19 at 11:54

2 Answers2

2

Could you please try following(tested with provided samples only).

awk 'BEGIN{s1="\"";FS=OFS="~"} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' Input_file

Output will be as follows.

"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"

Explanation: Adding explanation for above code now.

awk '                       ##Starting awk program here.
BEGIN{                      ##Starting BEGIN section of awk program here.
  s1="\""                   ##Setting variable s1 to " here.
  FS=OFS="~"                ##Setting value of FS and OFS as ~ here.
}                           ##Closing BEGIN block of awk code here.
{
  for(i=1;i<=NF;i++){       ##Starting for loop here from i=1 to till value of NF here.
    if($i!~/^\"|\"$/){      ##Checking condition of value of current field is NOT having s1 value in it.
      $i=s1 $i s1           ##Adding s1 variable before and after the value of $i.
    }                       ##Closing block for if condition.
  }                         ##Closing block for for loop here.
}                           ##Closing main block here.
1                           ##Mentioning 1 will print the lines of Input_file.
'  Input_file               ##mentioning Input_file name here.


RavinderSingh13
  • 130,504
  • 14
  • 57
  • 93
  • 1
    What if there is a field like `"a~b~c"`? – oguz ismail Aug 26 '19 at 09:58
  • 1
    @oguzismail, ok didn't consider it since it was NOT in OP's samples, let me fix it now. – RavinderSingh13 Aug 26 '19 at 10:00
  • 1
    I wouldn't fix it, this is clearly a duplicate. See my comment on the question. Upvoted since the logic is clever though – oguz ismail Aug 26 '19 at 10:01
  • 2
    `123~abc~~560000"a~b~c"` this should not be a valid input, since you only use double quote around field, not not within field, and since `~` is used as FS, `560000"a~b~c"` this part will break all. Correct should be `560000~"a~b~c"` – Jotne Aug 26 '19 at 10:17
  • 2
    It think its better to test if field not starts with double quote `if($i!~/^\"/)`, not contains double quote. `if($i !~ s1)` – Jotne Aug 26 '19 at 10:27
  • 1
    @Jotne, sure, added it now, thanks for letting know. – RavinderSingh13 Aug 26 '19 at 10:29
  • @RavinderSingh13 : It didn't work exactly for second and third line. It gave invalid result – user1485267 Aug 26 '19 at 11:26
  • @user1485267, please check now, edited the solution. – RavinderSingh13 Aug 26 '19 at 11:32
  • Should i consider the first message or explanation message to test it. – user1485267 Aug 26 '19 at 11:35
  • @Jotne, we should actually add condition for checking lines ending with `"` too, which I added that condition now. – RavinderSingh13 Aug 26 '19 at 11:35
  • @user1485267, refresh my answer. Yes please anyone you could try it out, I have edited both of them. – RavinderSingh13 Aug 26 '19 at 11:35
  • Thanks Ravinder, It works as expected. I have two questions based on the above solution. 1) If my OFS value is different. It replace all FS values with OFS value. But it should not happen for the data. 2) If encoding characters like japanese/chinese values comes in data it should not create a problem. – user1485267 Aug 26 '19 at 11:40
  • It doesn't work if we have a new line character in data which is in quotes – user1485267 Aug 26 '19 at 11:45
  • @user1485267, for `OFS` output field separator you could set it in `BEGIN` section. For other language characters I haven't tested it you could test it once. – RavinderSingh13 Aug 26 '19 at 11:47
  • @user1485267, You haven't mentioned new line samples, so this code is not written as per it. – RavinderSingh13 Aug 26 '19 at 11:47
  • For OFS i have modified your code in below way, It doesn't work. awk 'BEGIN{s1="\"";FS="~";OFS="~"} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' Input_file – user1485267 Aug 26 '19 at 11:50
  • @user1485267, in my code it is already `FS=OFS="~"` not sure which modification you had done to it? – RavinderSingh13 Aug 26 '19 at 11:52
  • Sorry, copy paste issue.. I am trying to rewrite sample output to comma delimiter awk 'BEGIN{s1="\"";FS="~";OFS=","} {for(i=1;i<=NF;i++){if($i!~/^\"|\"$/){$i=s1 $i s1}}} 1' Input_file – user1485267 Aug 26 '19 at 11:53
  • I have added my input for new line data.. – user1485267 Aug 26 '19 at 11:54
  • @ravinder : Can i know about this if($i!~/^\"|\"$/) what it does.. the tilda which is present in if condition is it delimiter value ? If yes, in case if my delimiter value is semi-colon it fails – user1485267 Aug 29 '19 at 13:13
0

Here you can use FPAT with gnu awk

awk -v FPAT='([^~]*)|("[^"]+")' -v OFS="~" '{for (i=1;i<=NF;i++) if ($i!~/^\"/) $i="\""$i"\""} 1' file
"eno"~"ename"~"address"~"zip"
"123"~"abc"~""~"560000"
"245"~"abc ~ def"~"hyd"~"560102"
"333"~"ghi~jkl"~"pub"~"560103"

Instead of telling how the Field Separator looks like, we tell how the filed looks like. Then test if field not has double quote, if no, add it.

You can then easy change the Field Separator if you like:

awk -v FPAT='([^~]*)|("[^"]+")' -v OFS="," '{for (i=1;i<=NF;i++) if ($i!~/^\"/) $i="\""$i"\""} 1' file
"eno","ename","address","zip"
"123","abc","","560000"
"245","abc ~ def","hyd","560102"
"333","ghi~jkl","pub","560103"
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • It is not working as expected for the last two lines. – user1485267 Aug 26 '19 at 11:33
  • 1
    @user1485267 You example did not show that. I rarely see that that occur, but if you have, you need a new post with a new question for a new solution. – Jotne Aug 26 '19 at 11:48
  • @user1485267, I totally agree with Jotne here, please always post correct samples which are near to your actual data, when you keep changing your samples it will be hard for all of us to help you too. – RavinderSingh13 Aug 26 '19 at 11:57
  • Sorry for changing inputs.. I agree with your point. I will add in new post. Thanks for your solutions Jotne & Ravindre – user1485267 Aug 26 '19 at 12:15
  • 1
    You don't need to create a new post, [the one your question was closed as a dup of](https://stackoverflow.com/q/45420535/1745001) shows you how to solve your problem whether your input contains newlines or not. Just do what it shows there to separate the input into fields and then `gsub(/^"|"$/,"",$i); $i="\""$i"\""` in a loop to wrap every field in double quotes before printing the record. – Ed Morton Aug 26 '19 at 12:20
  • I didnt understood where should i add that statement i am new to awk exploring based on the scenarios.. – user1485267 Aug 26 '19 at 17:21