In grep on Ubuntu, how can I display only the string that matched the regular expression?

Question

I am basically grepping with a regular expression on. In the output, I would like to see only the strings that match my reg exp.

In a bunch of XML files (mostly they are single-line files with huge amounts of data in a line), I would like to get all the words that start with MAIL_.

Also, I would like the grep command on the shell to give only the words that matched and not the entire line (which is the entire file in this case).

How do I do this?

I have tried

grep -Gril MAIL_* .
grep -Grio MAIL_* .
grep -Gro MAIL_* .

score 18 · Accepted Answer · answered Aug 06 '10 at 12:41

18

First of all, with GNU grep that is installed with Ubuntu, -G flag (use basic regexp) is the default, so you can omit it, but, even better, use extended regexp with -E.

-r flag means recursive search within files of a directory, this is what you need.

And, you are right to use -o flag to print matching part of a line. Also, to omit file names you will need a -h flag.

The only mistake you made is the regular expression itself. You missed character specification before *. Your command should look like this:

grep -Ehro 'MAIL_[^[:space:]]*' .

Sample output (not recursive):

$ echo "Some garbage MAIL_OPTION comes MAIL_VALUE here" | grep -Eho 'MAIL_[^[:space:]]*'
MAIL_OPTION
MAIL_VALUE

answered Aug 06 '10 at 12:41

thor

2,204
3
20
23

great..that works, but one quick question how do i do if i know the MAIL_* stuff are either present as type="MAIL_*" or >MAIL_*< in the files? any help on that one? – AMM Aug 06 '10 at 12:48
I don't get it. Could you rephrase your question? You want to see surrounding characters around your MAIL_XXX stuff? Like, you want to see " and <> in output of grep command? – thor Aug 06 '10 at 12:51
if your MAIL_* could only contain alphabetic characters (a-z), then you can change regexp to 'MAIL_[[:alpha:]]*' – thor Aug 06 '10 at 13:02

score 6 · Answer 2 · answered Aug 06 '10 at 12:57

6

Try the following command

grep -Eo 'MAIL_[[:alnum:]_]*'

answered Aug 06 '10 at 12:57

banx

4,376
4
30
34

score 2 · Answer 3 · answered Aug 06 '10 at 12:37

2

grep -o or --only-matching

outputs only the matching text instead of complete lines but the problem could be your regex that's not restrictive or greedy enough and actually matches the whole file.

answered Aug 06 '10 at 12:37

chocolate_jesus

101
1
9

now the type of words i want are present like this in the file type="MAIL_ABC_CDE" type="MAIL_XXX_AAA_AAA" etc there can be any number of _'s WHat should be the reg exp i shoudl use? any idea on that? – AMM Aug 06 '10 at 12:42

score 0 · Answer 4 · edited May 23 '17 at 12:09

From your comment to Thor's answer it seems you also want to distinguish if the MAIL_.* text is a text node or an attribute, not just to isolate it whenever it appears in the XML document. Grep cannot parse XML, you need a proper XML parser for that.

A command line xml parser is xmlstarlet. It is packaged in Ubuntu.

Using it on this example file example file:

$ cat test.xml 
<some_root>
    <test a="MAIL_as_attribute">will be printed if you want matching attributes</test>
    <bar>MAIL_as_text will be printed if you want matching text nodes</bar>
    <MAIL_will_not_be_printed>abc</MAIL_will_not_be_printed>
</some_root>

For selecting text nodes you can use:

$ xmlstarlet sel -t -m '//*' -v 'text()' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_text

And for selecting attributes:

$ xmlstarlet sel -t -m '//*[@*]' -v '@*' -n test.xml | grep -Eo 'MAIL_[^[:space:]]*'
MAIL_as_attribute

Brief explanations:

//* is an XPath expression that selects all elements in the document and text() outputs the value of their children text nodes, therefore everything except text nodes gets filtered out
//*[@*] is an XPath expression that selects all attributes in the document and then @* outputs their value

In grep on Ubuntu, how can I display only the string that matched the regular expression?

4 Answers4