non ascii special char remove from csv file

Question

While i am editing csv file in linux special character look like Â£stackoverflow, Â£unixbox,Â£query. My query is how to remove Â from csv file.

Input: Â£stackoverflow, Â£unixbox,Â£query Output: £stackoverflow, £unixbox,£query

Observations of linux box: currently linux window translation setting is ISO-8859-1, while i am changing the window setting--->translation-->UTF-8 then open the same file using vi editior Â char being disappeared.I have tried iconv command as well but didn't work.It may be the reason that i am conv the file ISO-8859-1 to UTF-8 but by default setting of linux is ISO-8859-1 so it is showing me Â it is not removing this char.How to handle it to remove the same.

Possible duplicate of [Using grep and sed to find and replace a string](https://stackoverflow.com/q/6178498/608639) — jww, Jan 18 '19 at 18:58
I think you should circle back to fix what's actually wrong. What are the original bytes and which character encoding was used to write them? — Tom Blodget, Jan 18 '19 at 23:27
thank you, but i am not able to copy Â char in linux box while i am copying space or reversing the other char, so it is not helpful as of now — boby1234, Jan 19 '19 at 05:00

stack0114106 · Answer 1 · 2019-01-19T10:25:14.107

0

You can try the below Perl solution. This removes all the ordinal values that are not in the range of 32 to 127 (which contains the ascii text)

$ echo "Â£stackoverflow, Â£unixbox,Â£query Output: £stackoverflow, £unixbox,£query" | perl -pe ' s/[^\x20-\x7f]//g '
stackoverflow, unixbox,query Output: stackoverflow, unixbox,query
$

EDIT:

To remove just Â, use

$ echo "Â" | perl -pe ' s/./sprintf("%x |",ord($&))/eg '  # Find the underlying ordinal values for Â 
c3 |82 |

$ echo "Â£stackoverflow, Â£unixbox,Â£query" | perl -pe ' s/\xc3\x82//g ' #removing it using s///
£stackoverflow, £unixbox,£query

$

edited Jan 19 '19 at 10:25

answered Jan 19 '19 at 02:29

stack0114106

8,534
3
13
38

thanks for help, but my ask was, input: Â£stackoverflow, Â£unixbox,Â£query and result would be £stackoverflow, £unixbox,£query – boby1234 Jan 19 '19 at 04:56
it didn't work in my case because i am not able to read Â£ in linux only in vi editor i can see this otherwise it look like £ only. but my asked is even after vi editor it should look like actual value £ instead of Â£. and also if you can share synatx to convert $ to £ inside file in linux.currently linux is not able to read £ char – boby1234 Jan 29 '19 at 17:39
it might be some other character.. can you collect the hexdump of the file – stack0114106 Jan 29 '19 at 17:56
````hexdump ```` – stack0114106 Jan 29 '19 at 18:01
````echo "$" | sed 's/\$/£/g'```` – stack0114106 Jan 29 '19 at 18:02
thanks mate but i am not able to copy pound sign in linux box. sed command not able to used. – boby1234 Jan 30 '19 at 17:02
£ sign octal code is 243 and speical char is looks like Â£. octal codde for Â (302)and £(243).Issue here is not able to copy Â char in linuxbox it paste as a space and revesing the words. e.g: if in linux using vi editor it looks "stack Â£overflow", when i will copy and paste in linux terminal then it will look "overflow stack ".how to handle it – boby1234 Feb 06 '19 at 03:29

non ascii special char remove from csv file

1 Answers1