47

I am trying to find a way for grep to output only the content of a capturing group. For instance, if I have the following file:

hello1, please match me
hello2, please do not match me

I would like

grep -Eo '(hello[0-9]+), please match me' file

To output hello1. However it outputs hello1, please match me.

Now, I know that grep -Po 'hello[0-9]+(?=, please match me)' will do the trick, but I'm thinking there must be a way to simply return a capturing group, but I couldn't find any info (on the net and in man grep).

Is it possible, or are capturing groups only meant to be backrefenced ? It would seem weird to me if there was no way of doing that.

Thank you for your time, and feel free to critique the way this post is constructed!

Alice
  • 972
  • 1
  • 7
  • 17
  • 2
    as far as I know, `GNU grep` doesn't support getting only the captured groups, unless you use lookarounds with PCRE option... `ripgrep` (an alternate implementation) does support what you are asking, but in spirit that is more like the search and replacement functionality provided by `sed`... so, if you need to manipulate capture groups, `sed` would be better choice – Sundeep Oct 14 '19 at 14:51
  • 1
    The non-consuming `(?=)` group with `-P` allows a sort of AND function in regexes. The other way to AND your regexes with grep is to pipe grep to grep. So what's wrong with piping grep to grep here? – stevesliva Oct 14 '19 at 19:41

6 Answers6

44

This question was asked ten years ago, so I won't mark it as duplicate. Also I noticed no sed solution was given since OP asked an answer without:

sed -nr 's/(hello[0-9]+), please match me/\1/p' test.txt
  • -n stands for quiet (won't print anything except if explicitly asked)
  • -r allows use of extented regular expressions (avoids here using \ before parenthesis)
  • s/reg/repl/p command means "if regexp reg matches the current line, replace it by captured text by repl, and prints it (/p)"
Amessihel
  • 5,891
  • 3
  • 16
  • 40
  • 16
    It comes now full circle, since `g/re/p` means "globally search a regular expression and print" – karakfa Oct 14 '19 at 15:40
  • @Amessihel, I'll chose Rici's answer because it is the closest to what i asked. Your answer is relly good too though, thank you. – Alice Oct 17 '19 at 07:05
35

You can use ripgrep, which generally seems to be superior to grep, like this

rg '(hello[0-9]+), please match me' -or '$1' <file>

where ripgrep uses -o or --only matching and -r or --replace to output only the first capture group with $1 (quoted to be avoid intepretation as a variable by the shell).

Patrick Häcker
  • 451
  • 4
  • 3
  • Beware of a bug if you combine `--replace` with `--multiline` in the currently released versions (v13.0.0 is the latest stable version at the time of writing). The output can contain (partially) duplicated results. The code has already been fixed, but no update has been released: https://github.com/BurntSushi/ripgrep/issues/2438#issuecomment-1451111793 – CodeManX Mar 08 '23 at 19:02
14

If you have either pcregrep or pcre2grep you can use the -o1 command-line flag to request that only capture group 1 is output. (Or change 1 to some other number if there are more captures in the regex.)

You can use the -oN command more than once if you want to output more than one capture group.

As far as I know, grep -P does not implement this extension. You'll find pcre2grep in Debian/Ubuntu package pcre2-utils. pcregrep is in package pcregrep.

rici
  • 234,347
  • 28
  • 237
  • 341
  • Why doesn't `apt` find PCRE2 as `pcre2grep` on Ubuntu 19.04? `$ sudo apt-get install pcre2grep; Reading package lists... Done; Building dependency tree; Reading state information... Done; E: Unable to locate package pcre2grep` – vstepaniuk Jan 01 '20 at 11:55
  • @vstepaniuk: As I said above, "You'll find pcre2grep in Debian/Ubuntu package **pcre2-utils**." – rici Jan 01 '20 at 13:45
  • **Why** doesn't apt find PCRE2 as pcre2grep on Ubuntu 19.04? – vstepaniuk Jan 01 '20 at 14:18
  • 1
    @vstepaniuk: because whoever built the package which contains various utilities using libpcre2 chose not to build a separate package only containing pcre2grep. I have no idea what went into that decision. You would have to ask them. But `apt` only knows about package names. – rici Jan 01 '20 at 14:33
  • "You can use the -oN command more than once if you want to output more than one capture group.". Which pcregrep supports this? I have pcregrep version 8.21 2011-12-12. When specifying multiple groups, only the last one gets returned. echo 'Name:"John" Email:"john@example.com"' | pcregrep -o1 '(?=Name:"(\w+?)".*Email:"(\w+@\w+\.\w+)")' John echo 'Name:"John" Email:"john@example.com"' | pcregrep -o2 '(?=Name:"(\w+?)".*Email:"(\w+@\w+\.\w+)")' john@example.com echo 'Name:"John" Email:"john@example.com"' | pcregrep -o1 -o2 '(?=Name:"(\w+?)".*Email:"(\w+@\w+\.\w+)")' john@example.com – Zongjun Dec 26 '22 at 22:25
  • @zongjun: I have pcregrep v8.39 and pcre2grep v10.31, and both of them produce `Johnjohn@example.com` for your last command. – rici Dec 28 '22 at 02:11
  • @rici, thanks for confirming. Version does matter. I aligned with your versions and can see your result now. Do you know is there a way to put a seperator between -o1 and -o2, say tab or space? So result can be "John john@example.com"? – Zongjun Dec 29 '22 at 03:28
  • @Zongjun: `-om-separator=' '`. If you installed those programs, you should also have manpages installed, so you can get complete documentation by type `man pcregrep` or `man pcre2grep`. I just found that option by reading through the manpage. – rici Dec 29 '22 at 04:02
6

grep, sed and awk have ancient regular expression engines that don't support any modern regex features. I don't really think they're fit for purpose anymore.

One thing Perl is still good for is as a replacement for those in pretty much all one-liners, as it has a very nice, modern regex engine, and a couple of handy command line switches, -ne and -pe.

The switches cause Perl to automatically apply your expression to each line of the input and either unconditionally print the result, or let you control printing of the result.

For instance, to print the first hello followed by a digit (hello\d) for all lines that have hello\d followed by please match me, you can do:

perl -ne 'm/(hello\d) please match me/ && print "$1\n"' <file>

There are many nice sites out there that list common tasks you can do with a Perl one-liner, such as this one.

I also think that ripgrep should be in everyone's toolbox.

Roger Dahl
  • 15,132
  • 8
  • 62
  • 82
3

Just an awk version.

awk -F, '/hello[0-9]+, please match me/ {print $1}' file
hello1
Jotne
  • 40,548
  • 12
  • 51
  • 55
  • I'm not sure I understand how to use this— I don't see a capturing group in your regex. – Slipp D. Thompson Mar 30 '23 at 22:20
  • 1
    @SlippD.Thompson It output everything before the , if it has hello with a number and please match me. You can use `awk -F, '/, please match me/ {print $1}' file` to get anything in front of "please match me" – Jotne Mar 31 '23 at 08:25
0

There is a tricky way with Perl mode

$ echo "hello1, please match me" | rev | grep -oP 'em hctam esaelp ,\K[0-9]olleh' | rev
hello1

essentially using \K lookbehind by reversing the input and search terms.

You can outsource reversing the search term to rev as well.

$ echo hello1, please match me | 
  rev | 
  grep -oP "$(echo hello1K\\, please match me | rev)" | 
  rev
karakfa
  • 66,216
  • 7
  • 41
  • 56
  • 4
    What's the benefit of this solution since OP asked of a simple way to return a captured group? – Amessihel Oct 14 '19 at 15:08
  • @Amessihel This doesn't really meet my requirement, sure, but I do find that this is a really funny solution – Alice Oct 14 '19 at 15:15
  • looking back is easier than looking ahead, most likely that's why there is a `\K` but not one for lookahead. Not really for practical use... – karakfa Oct 14 '19 at 15:34