6

I use TextEdit on macosx created two files, same contents with different encodings, then

grep xxx filename_UTF-16

nothing

grep xxx filename_UTF-8

xxxxxxx xxxxxxyyyyyy

grep did not support UTF-16?
kenorb
  • 155,785
  • 88
  • 678
  • 743
toughtalker
  • 461
  • 2
  • 6
  • 14
  • This should be moved to unix.stackexchange.com – Warren Young Aug 01 '11 at 13:44
  • 1
    Possible duplicate of [grepping binary files and UTF16](https://stackoverflow.com/questions/3752913/grepping-binary-files-and-utf16) – kenorb Jan 17 '19 at 13:00
  • I would like to also add that you could probably write a C program to search the files for strings in the time it took to post and look for answers. – Bayleef Jan 17 '19 at 13:00

5 Answers5

5
iconv -f UTF-16 -t UTF-8 yourfile | grep xxx
hmontoliu
  • 3,960
  • 1
  • 19
  • 21
4

You could always try converting first to utf-8:

iconv -f utf-16 -t utf-8 filename | grep xxxxx
ninjalj
  • 42,493
  • 9
  • 106
  • 148
1

Use ripgrep utility instead of grep which can support grepping UTF-16 files. Install by: brew install ripgrep.

Then run:

rg xxx filename_UTF-16

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

kenorb
  • 155,785
  • 88
  • 678
  • 743
0

You could also use ugrep which supports UTF-8, UTF-16, UTF-32 and other file formats according to its readme:

ugrep searches UTF-encoded input when a UTF BOM (byte order mark). Option --encoding permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.

ugrep matches Unicode patterns by default (disabled with option -U). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.

Dr. Alex RE
  • 1,772
  • 1
  • 15
  • 23
-1

Define the following Ruby's shell function:

grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }

Then use it as:

grep16 xxx filename_UTF-16

See: How to use Ruby's readlines.grep for UTF-16 files?

For more suggestions, check: grepping binary files and UTF16

kenorb
  • 155,785
  • 88
  • 678
  • 743