1

I want a regular expression which catches every time +other appears as well as everything until the next comma.

With

(word),+(other)(word),(code),(word),(other)(code),(example)
(code),+(other),+(other)(code)(word)(example),(example),+(example)
+(code),(other)(code)(word),(code),(word)

I want to return

+(other)(word)
+(other)
+(other)(code)(word)(example)

My command that I would use looks something like egrep -o '\+\(other).*,. The only problem is that the comma in this regex isn't necessarily the next comma. Right now the command returns

+(other)(word),(code),(word),(other)(code),
+(other),+(other)(code)(word)(example),(example),
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116
Sam
  • 1,765
  • 11
  • 82
  • 176

2 Answers2

1

You consume any 0+ chars as many as possible up to the last (and including) , with .*,.

To avoid matching , and only match up to the first ,, use a negated bracket expression [^,] and apply * quantifier to it:

 egrep -o '\+\(other\)[^,]*

The [^,]* pattern will match any 0+ characters other than ,.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Thanks! I have once more question that isn't in the above, if I want the letter p and a - symbol to appear after \(other\) but before the next comma how can I incorporate that. This is all I can think of right now '\+\(other\).*p.*\-.*[^,]*' but again my .* are capturing everything again – Sam Jan 02 '17 at 07:43
  • 1
    It would mean using the same "tempering" negated bracket expression: `\+\(other\)[^,]*p[^,]*-[^,]*` ([demo](https://regex101.com/r/syNSbc/1)) – Wiktor Stribiżew Jan 02 '17 at 07:44
  • What exactly is [^,]* because to me that seems like it means any amount of commas at the beginning of the line and it doesn't make sense to me in this context – Sam Jan 02 '17 at 08:00
  • 1
    As I said, it is a negated bracket expression. `^` (caret) has several meanings in regexps that depend on its position. When outside of brackets, it means the beginning of the string. When inside brackets, but not the first position, it means a literal `^`. When it is inside brackets but at the start position, it *negates* all the chars/character sets/ranges in the bracket expression. `[abc]` matches 1 char, `a`, `b` or `c`. `[^abc]` will match `^`, `&`, `*`, `z`, `1`, etc., any char other than the one defined in the bracket expression. – Wiktor Stribiżew Jan 02 '17 at 08:02
0

If your grep supports Perl compatible regular expressions (PCRE), you can use non-greedy matching:

$ grep -Po '\+\(other\).*?,' infile
+(other)(word),
+(other),
+(other)(code)(word)(example),
Benjamin W.
  • 46,058
  • 19
  • 106
  • 116