1

Creating a search and replace function for my application, I am running a test scenario with 3 files, array tscript test

I am trying to escape double quotation marks but it wont work

script file contains

variableName=$1
sed "s#data\-field\=\"${variableName}\.name\"#data\-field\=${variableName}\.name data\-type\=dropdown data\-dropdown\-type\=${variableName}#g" test

test file contains

data-field=“fee_category.name”
data-field=“tax_type.name”

array file contains

fee_category
tax_type

There is no error code, the output is just what I inputted because the sed command could not find what it was looking for, if I dont use double quotes next to ${VariableName} and remove them from the test file the function works fine.

Zaki Ahmed
  • 35
  • 1
  • 5
  • 4
    Your sample input contains [non-ASCII double quotes](http://www.fileformat.info/info/unicode/char/201c/index.htm), which ASCII `"` i the `sed` command won't match. – mklement0 Apr 17 '17 at 20:09
  • Do you want to search for a literal string or a regexp? If it's a regexp do you want capture groups to be enabled or not? Do you want backreference metacharacters (e.g. `&` or `\1`) to be enabled in the replacement text or not? – Ed Morton Apr 17 '17 at 20:21
  • I am searching for the literal string and a replacement for it using regexp – Zaki Ahmed Apr 17 '17 at 20:25
  • Maybe I wasn't being clear. If `variableName` has the value `foo.bar` do you want to search for the literal string `foo.bar` or do you want to search for the regexp `foobar` so, for example, the former would NOT match `foo8bar` while the latter WOULD match it? – Ed Morton Apr 17 '17 at 20:28
  • Double quotes in sed usually does not require escaping. If these chars are not ascii double quotes (hex 022) we need to identify what they are. What is the output of `head -n2 testfile | od -t x1c` ? – George Vasiliou Apr 17 '17 at 20:30
  • in reference to the non-ASCII double quotes, can you elaborate I dont understand. – Zaki Ahmed Apr 17 '17 at 20:56
  • 0000000 64 61 74 61 2d 66 69 65 6c 64 3d e2 80 9c 66 65 d a t a - f i e l d = “ ** ** f e 0000020 65 5f 63 61 74 65 67 6f 72 79 2e 6e 61 6d 65 e2 e _ c a t e g o r y . n a m e ” 0000040 80 9d 0a 64 61 74 61 2d 66 69 65 6c 64 3d e2 80 ** ** \n d a t a - f i e l d = “ ** 0000060 9c 74 61 78 5f 74 79 70 65 2e 6e 61 6d 65 e2 80 ** t a x _ t y p e . n a m e ” ** – Zaki Ahmed Apr 17 '17 at 21:01
  • `“` is different from `"`, the first belongs to the group of non-ASCII double quotes – Pedro Lobito Apr 17 '17 at 21:05
  • YES^ THIS WAS INDEED THE ISSUE, WOW IM BAMBOOZLED BUT THANK YOU SO MUCH HAS BEEN A LONG 2 HOURS! – Zaki Ahmed Apr 17 '17 at 21:10
  • @PedroLobito Sorry, but how someone can type `“` ? – George Vasiliou Apr 17 '17 at 21:19
  • I copy+pasted ;) – Pedro Lobito Apr 17 '17 at 21:33

2 Answers2

1

Following the comment of mklement0 , i am only writing this answer in order to share some of my findings in case we need a literal match of your special double quotes. It might be useful to other users.

Your quoted text fee_category.name has Unicode Left Double Quotation Mark U+201c quotes on the left side and Unicode Right Double Quotation Mark U+201d on the right side.

Those non std quotation marks have also some representation in UTF-8 :

Unicode Left Double Quotation Mark U+201c
UTF-8 (hex) 0xE2 0x80 0x9C (e2809c)
UTF-16 (hex) 0x201C (201c)

Unicode Right Double Quotation Mark U+201d
UTF-8 (hex) 0xE2 0x80 0x9D (e2809d)
UTF-16 (hex) 0x201D (201d)

Analyzing your file with od utility we can confirm presence of above hex utf-8 sequences in your data:

$ echo data-field=“fee_category.name” |od -w40 -t x1c
0000000  64  61  74  61  2d  66  69  65  6c  64  3d  e2  80  9c  66  65  65  5f  63  61  74  65  67  6f  72  79  2e  6e  61  6d  65  e2  80  9d  0a
          d   a   t   a   -   f   i   e   l   d   = 342 200 234   f   e   e   _   c   a   t   e   g   o   r   y   .   n   a   m   e 342 200 235  \n

What is interesting is that we can print those unicode characters in bash either by using their unicode code or by using the utf-8 hex series :

$ echo -e "\u201c test \u201d"
“ test ”
$ echo -e "\xe2\x80\x9c test \xe2\x80\x9d"
“ test ”

Accordingly we can force sed to match those special chars like this:

$ string=$(echo -e "\u201c test \u201d");echo "$string"
“ test ”
$ lq=$(echo -ne "\u201c");rq=$(echo -ne "\u201d")
$ sed -E "s/($lq)(.+)($rq)/**\2**/" <<<"$string"
** test **

Also this seems to work fine, without the need of using "helper" variables:

$ sed -E "s/(\xe2\x80\x9c)(.+)(\xe2\x80\x9d)/**\2**/" <<<"$string"
** test **

Meaning that the hex sequence \xe2\x80\x9c (or \xe2\x80\x9d for right quotes) can be used directly by sed to provide a literal match on this special quotes.

You might as well make a pre-process of your files and convert all those non standard quotes to standard quotes using something like :

$ sed -E "s/[\xe2\x80\x9c,\xe2\x80\x9d]/\x22/g" <<<"$string"
" test "   #Special quotes replaced with classic ascii quotes.

Above test have been done in Debian Testing & Bash 4.4 & GNU Sed 4.4 and may be this techniques will not work in other sed flavors.

Community
  • 1
  • 1
George Vasiliou
  • 6,130
  • 2
  • 20
  • 27
0

In case of doubt, you can try to wildcard them:

variableName="fee_category"
sed "s#data-field=.${variableName}\.name.#& data-type=dropdown data-dropdown-type=${variableName}#g" test

# Or, when you do not want those quotes back in your output
sed "s#\(data-field=\).\(${variableName}\)\(\.name\).#\1\2\3 data-type=dropdown data-dropdown-type=\2#g" test
Walter A
  • 19,067
  • 2
  • 23
  • 43