2

I am dealing with some multilingual data(English and Arabic) in a json file with a weird character i am not able to parse. I am not sure what the character is. I tried getting the ASCII value via vim and this is what i got

"38 0x26"

This is the status line in vim i used to get the value (http://vim.wikia.com/wiki/Showing_the_ASCII_value_of_the_current_character).

:set statusline=%<%f%h%m%r%=%b\ 0x%B\ \ %l,%c%V\ %P

This is how the character looks in vim - The character in question

I tried 'sed' and '.gsub' to replace this character unsuccessfully.

Is there a way where i can replace this character(preferably with .gsub ruby) with '&' or something else?

Thanks

Prashanth
  • 347
  • 3
  • 4
  • 12

2 Answers2

0

try with something like

sed 's/[[:alpnum:][:space:]\[\]{}()\.\*\\\/_(AllAsciiVariationYouWant)/&/g;t
s/./?/g' YourFile

where (AllAsciiVariationYouWant) is all character that you want to keep as is (without the surrounding "()" )

NeronLeVelu
  • 9,908
  • 1
  • 23
  • 43
0

JSON is encoded in UTF-8 (Unicode). If you're seeing funky-looking characters in your file, it's probably because your editor is not treating Unicode characters properly. That could be caused by the use of a terminal emulator that doesn't support Unicode; an incorrect $LANG setting; vim not being able to correctly determine the encoding of the file; and likely other reasons.

What terminal program are you using? What's your $LANG environment variable set to (echo $LANG)? If you're certain your terminal supports Unicode, try:

LANG=en_US.utf-8 vim your_file_here.json

(The above example assumes that U.S. English is appropriate for the file, which it may not be.)

As for replacing characters in the file, vim's substitution command can be used:

:%s/old text/new text/g

The above command will run the substitute command on all lines in the file (%), replacing every instance of "old text" with "new text". (The g at the end tells vim to replace every instance on a line, not just the first it finds.)

Paul M.
  • 34
  • 2
  • echo $LANG -> en_IN I have already tried substitution and its of no use – Prashanth Dec 02 '13 at 09:49
  • Have you tried `LANG=en_IN.UTF-8`? And I be surprised if vim's substitution command wasn't up to the task. The problem, it seems, is that your terminal can't display the character correctly, making it difficult to know which character to include in the substitution regexp. – Paul M. Dec 03 '13 at 22:27
  • no I have not trued LANG=en_IN.UTF-8. if i can somehow get a regexp for the character it would solve my problem. Let me try UTF-8 – Prashanth Dec 04 '13 at 06:27