I use TextEdit on macosx created two files, same contents with different encodings, then
grep xxx filename_UTF-16
nothing
grep xxx filename_UTF-8
xxxxxxx xxxxxxyyyyyy
grep did not support UTF-16?
I use TextEdit on macosx created two files, same contents with different encodings, then
grep xxx filename_UTF-16
nothing
grep xxx filename_UTF-8
xxxxxxx xxxxxxyyyyyy
grep did not support UTF-16?
You could always try converting first to utf-8:
iconv -f utf-16 -t utf-8 filename | grep xxxxx
Use ripgrep
utility instead of grep
which can support grepping UTF-16 files. Install by: brew install ripgrep
.
Then run:
rg xxx filename_UTF-16
ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the
-E
/--encoding flag.
)
You could also use ugrep which supports UTF-8, UTF-16, UTF-32 and other file formats according to its readme:
ugrep searches UTF-encoded input when a UTF BOM (byte order mark). Option
--encoding
permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.ugrep matches Unicode patterns by default (disabled with option
-U
). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.
Define the following Ruby's shell function:
grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }
Then use it as:
grep16 xxx filename_UTF-16
See: How to use Ruby's readlines.grep for UTF-16 files?
For more suggestions, check: grepping binary files and UTF16