"Illegal Byte sequence" error while using shell commands in mac bash terminal

Question

Getting "illegal byte sequence" error while trying to extract non English characters from a large file in MacOS bash shell. This is the script that I am trying to use:

sed 's/[][a-z,0-9,A-Z,!@#\$%^&*(){}":/_-|. -][\;''=?]*//g' < $1 >Abhineet_extract1.txt;
sed 's/\(.\)/\1\
/g' <Abhineet_extract1.txt | sort | uniq |tr -d '\n' >&1;
rm Abhineet_extract1.txt;

and here is the error that I am getting:

uniq: stdin: Illegal byte sequence

'+?

score 14 · Accepted Answer · answered Sep 23 '13 at 07:29

14

It seems that a UTF-8 locale is causing Illegal byte sequence.

Instead say:

LC_CTYPE=C your_command

man locale says:

   These environment variables affect each locale categories for all
   locale-aware programs:

   LC_CTYPE

           Character classification and case conversion.

answered Sep 23 '13 at 07:29

devnull

118,548
33
236
227

1

Thanks for your help. The error is gone but the output now contains only '+? characters. I was feeding the output from a sqlite query to the script. I formatted the output to csv and then my script started working. – Abhineet Prasad Sep 23 '13 at 07:54
It's not very clear what you're saying. Please update your question instead. – devnull Sep 23 '13 at 07:56
I ran into this issue while deleting some sensitive data from my git history using `git filter-branch --tree-filter "find . -type f -exec sed -i -e 's/originalpassword/newpassword/g' {} \;"` and it worked like a charm – TMin Jun 11 '18 at 22:20
1

So, is this an issue with Sort? does FreeBSD still suffer from this problem? or should I just file a bug with Apple and tell them to update their ancient ass bin/utils? – MarcusJ Nov 23 '18 at 18:28

"Illegal Byte sequence" error while using shell commands in mac bash terminal

1 Answers1

Linked