0

I have written a regular expression to match hyphenated word in python

regexp = r"[a-z]+(?:-[a-z]+)*"

It matches words with zero or more hyphens. For e.g. abc,acd-def,x-y-y etc. However, I can't find this grouping operator ?: for shell(for instance using with grep). It seems to me that this is a feature of python regex only not standard regex.

Can anyone please tell me how to write the same regex in shell?

Max
  • 9,100
  • 25
  • 72
  • 109
  • 2
    It is a standard [non-capturing group](http://www.regular-expressions.info/brackets.html). If you can't use it in bash, use a capturing group: `[a-z]+(-[a-z]+)*`. – Wiktor Stribiżew Aug 05 '15 at 10:01
  • 1
    Remove the non-capturing group or use the `-P` flag for grep. `grep -Po "[a-z]+(?:-[a-z]+)*" file` – bro Aug 05 '15 at 10:03
  • 1
    I think this is a duplicate of [How to use non-capturing groups in grep?](http://stackoverflow.com/questions/15136366/how-to-use-non-capturing-groups-in-grep), but have a slight doubt... – Wiktor Stribiżew Aug 05 '15 at 10:15

2 Answers2

3

(?:pattern) matches pattern without capturing the contents of the match. It is used with the following * to allow you to specify zero or more matches of the contents of the ( ) without creating a capture group. This affects the result in python if you used something like re.search(), as the MatchObject would not contain the part from the (?: ). In grep, the result isn't return in the same way, so can just remove the ?: to use a normal group:

grep -E '[a-z]+(-[a-z]+)*' file

Here I'm using the -E switch to enable extended regular expression support. This will output each line matching the pattern - you can add the -o switch to only print the matching parts.

As mentioned in the comments (thanks), it is possible to use back-references (like \1) with grep to refer to previous capture groups inside the pattern, so technically the behaviour is being changed slightly by removing the ?:, although this isn't something that you're doing at the moment so it doesn't really matter.

Tom Fenech
  • 72,334
  • 12
  • 107
  • 141
1

Your regular expression doesn't "match hyphenated word" - it matches words made up of [-a-z] where the first and last character must be in [a-z]. I.e. it matches [a-z] (one-letter words) or [a-z][-a-z]*[a-z].

Your question is ambiguous - bash normally deals with wildcard expressions; grep can process regular expressions.

  • Bash

    This cannot be done with wilcards. You may use the =~ operator inside [[ ]] brackets: [[ $string =~ [a-z]|[a-z][-a-z]*[a-z] ]].

  • Grep

    You can combine two regexes with | like so: [a-z]|[a-z][-a-z]*[a-z].

Reading between the lines of your question - "to match hyphenated word" sounds more like you want a regexp like [a-z]+(-[a-z]+)+ so that there's at least one - in your match.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103