grep to only match whole words with non-english characters

Question

I want to only grep for a whole word. The problem is a file contains non-english characters, so grep -w doesn't work (f.e. matches "aąbcć" when searching for "bc"). I can't write any working regex with lookaround either. Can anybody help me?

FYI: http://stackoverflow.com/questions/9618647/allowing-non-latin-characters-with-regex — mcsilvio, Jan 27 '14 at 19:43
`LC_MESSAGES=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8 LANG=pl_PL.UTF-8 LANGUAGE=pl_PL:en LC_CTYPE=pl_PL.UTF-8 ` — kszl, Jan 27 '14 at 22:07

score 0 · Answer 1 · answered Jan 27 '14 at 19:49

0

Try to use word boundaries in grep:

grep "\<bc\>" file

answered Jan 27 '14 at 19:49

anubhava

761,203
64
569
643

`echo -e "aąbcć\nbc" | grep "\"` gives me two matches. One thing changed: first line isn't colored anymore. – kszl Jan 27 '14 at 20:04
Very strange since I am getting only 1 match from this command. – anubhava Jan 27 '14 at 20:06
But I am testing it on `Mac OSX`. – anubhava Jan 27 '14 at 20:08
See this bug report on this matter: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=720482 – anubhava Jan 27 '14 at 20:32

glenn jackman · Answer 2 · 2014-01-27T22:56:45.120

0

Requiring GNU grep: grep -P '(^|\s)\Kbc(?=$|\s)' file

Using awk, I wonder if this would work:

awk -v word="bc" '{for (i=1; i<=NF; i++) if ($i == word) {print; break}}' file

edited Jan 27 '14 at 22:56

answered Jan 27 '14 at 21:36

glenn jackman

238,783
38
220
352

I need whole lines where word exists. – kszl Jan 27 '14 at 22:07
Unfortunately, "bc," doesn't pass the test in awk solution. – kszl Jan 27 '14 at 23:16

grep to only match whole words with non-english characters

2 Answers2