I want to only grep for a whole word. The problem is a file contains non-english characters, so grep -w
doesn't work (f.e. matches "aąbcć" when searching for "bc"). I can't write any working regex with lookaround either. Can anybody help me?
Asked
Active
Viewed 748 times
0

kszl
- 1,203
- 1
- 11
- 18
-
FYI: http://stackoverflow.com/questions/9618647/allowing-non-latin-characters-with-regex – mcsilvio Jan 27 '14 at 19:43
-
What is your locale? `env | grep 'LC\|LANG'` – glenn jackman Jan 27 '14 at 21:33
-
`LC_MESSAGES=pl_PL.UTF-8 LC_COLLATE=pl_PL.UTF-8 LANG=pl_PL.UTF-8 LANGUAGE=pl_PL:en LC_CTYPE=pl_PL.UTF-8 ` – kszl Jan 27 '14 at 22:07
2 Answers
0
Try to use word boundaries in grep:
grep "\<bc\>" file

anubhava
- 761,203
- 64
- 569
- 643
-
`echo -e "aąbcć\nbc" | grep "\
"` gives me two matches. One thing changed: first line isn't colored anymore. – kszl Jan 27 '14 at 20:04 -
-
-
See this bug report on this matter: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=720482 – anubhava Jan 27 '14 at 20:32
0
Requiring GNU grep: grep -P '(^|\s)\Kbc(?=$|\s)' file
Using awk, I wonder if this would work:
awk -v word="bc" '{for (i=1; i<=NF; i++) if ($i == word) {print; break}}' file

glenn jackman
- 238,783
- 38
- 220
- 352