12

I have problem write grep which should grep only those lines, in which is word that consist only from capital characters.

For example I have file : file1.txt

Abc AAA
ADFSD
F
AAAAx

And output should be :

Abc AAA
ADFSD
F

Thank for any advice.

Ulrich Eckhardt
  • 16,572
  • 3
  • 28
  • 55
Tempus
  • 220
  • 1
  • 3
  • 9

7 Answers7

15

You can just use:

grep -E '\b[[:upper:]]+\b' file1.txt

That is, look for whole words composed of only uppercase letters.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
10

This egrep should work:

egrep '\b[A-Z]+\b' file
anubhava
  • 761,203
  • 64
  • 569
  • 643
  • This don't work when `file` contains capitalize word having `_`(e.g. `HELLO_WORLD`) – alhelal Dec 30 '17 at 17:15
  • `_` is not considered a word boundary so `HELLO_WORLD` is not really a word that consists of only capital letters. – anubhava Dec 30 '17 at 17:24
  • I think `_` is in word boundary, but not meaningful word boundary. If I am wrong, then you can give me a reference so that I can learn something new. Thank you for giving interesting information. – alhelal Dec 30 '17 at 17:28
  • See this Q&A: https://stackoverflow.com/questions/1324676/what-is-a-word-boundary-in-regexes – anubhava Dec 30 '17 at 17:31
  • 2
    There they say **...with a word character ([0-9A-Za-z_])**. Thank you, very much for such a link. This idea is new to me. – alhelal Dec 30 '17 at 17:40
3

This will produce the desired results,

egrep '\b[A-Z]+\b'  file1.txt

Results are

Abc AAA
ADFSD
F
CS Pei
  • 10,869
  • 1
  • 27
  • 46
1

GNU grep supports POSIX patterns, so you can simply do:

grep -e '[[:upper:]]' file1.txt

Elias Probst
  • 275
  • 1
  • 12
1

If your input contains non-ASCII characters, you may want to use \p{Lu} instead of [A-Z]:

grep -P '\b\p{Lu}+\b' file

For

LONDON 
Paris
MÜNCHEN Berlin

this will return

LONDON
MÜNCHEN Berlin

You can probably list most of these things manually, and as @Skippy-le-grand-gourou says, egrep extends [A-Z] to accented letters, but by using \p{Lu}, you do not need to deal with things like "Since June 2017, however, capital ẞ is accepted as an alternative in the all-caps style"

Jirka
  • 4,184
  • 30
  • 40
1
grep -oP '\b[A-Z0-9_]+\b' file1.txt  

This results words consisting of uppercase/digit/_ (e.g. HELLO, NUMBER10, RLIMIT_DATA).

But, this also accept eDw.

alhelal
  • 916
  • 11
  • 27
0

grep '\<[A-Z]*>' file1.txt

dono
  • 17
  • 1