4

I'm trying to write a regular expression that matches all word inside a specific string, but skips words inside brackets. I currently have one regex that matches all words:

/[a-z0-9]+(-[a-z0-9]+)*/i

I also have a regex that matches all words inside brackets:

/\[(.*)\]/i

I basically want to match everything that the first regex matches, but without everything the second regex matches.

Sample input text: http://gist.github.com/222857 It should match every word separately, without the one in the brackets.

Any help is appreciated. Thanks!

Marc
  • 1,174
  • 3
  • 12
  • 28

6 Answers6

3

Perhaps you could do it in two steps:

  1. Remove all the text within brackets.
  2. Use a regular expression to match the remaining words.

Using a single regular expression to try to do both these things will end up being more complicated than it needs to be.

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
1

How 'bout this:

your_text.scan(/\[.*\]|([a-z0-9]+(?:-[a-z0-9]+)*)/i) - [[nil]]
glenn mcdonald
  • 15,290
  • 3
  • 35
  • 40
  • Hey Glenn, you mean and then look at Group 1? That's a cool simple technique that for some reason very few people seem to be using. +1! :) I just used it on a [regex bounty quest](http://stackoverflow.com/q/23589174) and found your answer while researching if anyone is using the technique. – zx81 May 13 '14 at 21:35
1

Which Ruby version are you using? If it's 1.9 or later, this should do what you want:

/(?<![\[a-z0-9-])[a-z0-9]+(-[a-z0-9]+)*(?![\]a-z0-9-])/i
Alan Moore
  • 73,866
  • 12
  • 100
  • 156
0

I don't think I understand the question properly. Why not just make a new string that does not contain the second regex like so:

string1 =~ s/\[(.*)\]//g

Off the top of my head won't that match what you deleted while storing the result in string1? I have not tested this yet though. I might test it later.

Robert Massaioli
  • 13,379
  • 7
  • 57
  • 73
0

I agree with Shhnap. Without more info, it sounds like the easiest way is to remove what you don't want. but it needs to be /[(.*?)]/ instead. After that you can split on \s.

If you are trying to iterate through each word, and you want each word to match maybe you can cheat a little with: string.split(/\W+/) .You will lose the quotations and what not, but you get each word.

cgr
  • 1,093
  • 1
  • 8
  • 14
-1

This seems to work:

[^\[][a-z0-9]+(-[a-z0-9]+)*

if the first letter of a word is an opening bracket, it doesnt match it.

btw, is there a reason why you are capturing the words with dashes in them? If no need for that, your regex could be simplified.

AleB
  • 82
  • 2