grep unicode 16 support

Question

I use TextEdit on macosx created two files, same contents with different encodings, then

grep xxx filename_UTF-16

nothing

grep xxx filename_UTF-8

xxxxxxx xxxxxxyyyyyy

grep did not support UTF-16?

Possible duplicate of [grepping binary files and UTF16](https://stackoverflow.com/questions/3752913/grepping-binary-files-and-utf16) — kenorb, Jan 17 '19 at 13:00
I would like to also add that you could probably write a C program to search the files for strings in the time it took to post and look for answers. — Bayleef, Jan 17 '19 at 13:00

score 5 · Accepted Answer · answered Jul 30 '11 at 08:50

5

iconv -f UTF-16 -t UTF-8 yourfile | grep xxx

answered Jul 30 '11 at 08:50

hmontoliu

3,960
1
19
21

score 4 · Answer 2 · answered Jul 30 '11 at 08:49

4

You could always try converting first to utf-8:

iconv -f utf-16 -t utf-8 filename | grep xxxxx

answered Jul 30 '11 at 08:49

ninjalj

42,493
9
106
148

score 1 · Answer 3 · answered Jan 17 '19 at 13:00

1

Use ripgrep utility instead of grep which can support grepping UTF-16 files. Install by: brew install ripgrep.

Then run:

rg xxx filename_UTF-16

ripgrep supports searching files in text encodings other than UTF-8, such as UTF-16, latin-1, GBK, EUC-JP, Shift_JIS and more. (Some support for automatically detecting UTF-16 is provided. Other text encodings must be specifically specified with the -E/--encoding flag.)

answered Jan 17 '19 at 13:00

kenorb

155,785
88
678
743

This is the best solution if you need to recursively search a directory: `rg -E UTF-16 ` – jasxun Mar 10 '21 at 08:15

Dr. Alex RE · Answer 4 · 2020-05-29T17:29:58.590

You could also use ugrep which supports UTF-8, UTF-16, UTF-32 and other file formats according to its readme:

ugrep searches UTF-encoded input when a UTF BOM (byte order mark). Option --encoding permits many other file formats to be searched, such as ISO-8859-1, EBCDIC, and code pages 437, 850, 858, 1250 to 1258.

ugrep matches Unicode patterns by default (disabled with option -U). The regular expression syntax is POSIX ERE compliant, extended with Unicode character classes, lazy quantifiers, and negative patterns to skip unwanted pattern matches to produce more precise results.

score -1 · Answer 5 · answered May 20 '19 at 23:25

Define the following Ruby's shell function:

grep16() { ruby -e "puts File.open('$2', mode:'rb:BOM|UTF-16LE').readlines.grep(Regexp.new '$1'.encode(Encoding::UTF_16LE))"; }

Then use it as:

grep16 xxx filename_UTF-16

See: How to use Ruby's readlines.grep for UTF-16 files?

For more suggestions, check: grepping binary files and UTF16

grep unicode 16 support

5 Answers5

Linked