0

I'd like to be able to use a regular expression to match a given string, but not a specific longer word which contains it. Here's an example to better explain:

Given the text:

String bellsringing = "The bells are ringing is a String";

I want to be able to find all occurences of "ring", that are not part of the word String, not limited to word (can appear inside one). So the answer would be only "bells(ring)ing" and "(ring)ing".

I am aware that a program can be used for such a task, but I have come accross the need to find specific strings in large libraries and if the sought string is a substring of a common keyword / literal, I have lots of digging to do and would benefit from the IDE's search using regex function :)

Thanks for any input on this.

Adrian B.
  • 1,592
  • 1
  • 20
  • 38
  • 1
    This is probably helpful: http://stackoverflow.com/questions/15130309/how-to-use-regex-in-string-contains-method-in-java/15130382#15130382 – nhahtdh Nov 11 '14 at 15:14
  • thanks, but that does not solve my problem. – Adrian B. Nov 11 '14 at 15:16
  • what exactly would be the ouput? – vks Nov 11 '14 at 15:17
  • I bolded what the output would be :) – Adrian B. Nov 11 '14 at 15:18
  • 2
    Is "String" the only text you want to exclude? Or something like "bring", too? You can use [word boundaries](http://www.regular-expressions.info/wordboundaries.html) to match the start or end of a word, but differentiating between _in the middle of some words but not others_ could be really hairy. – Wiseguy Nov 11 '14 at 15:20
  • let's say I only want to exclude String as excluding multiple words can get ever hairyer :) but yes, the sought string can occur inside the word... searching for a full word or one ending/starting with the text I can do – Adrian B. Nov 11 '14 at 15:22
  • 1
    Look at **[this answer](http://stackoverflow.com/a/23589204/2684660)** about how to capture a string except in contexts A, B or C. Regexes can't understand whether a group of letters within a word has retained its meaning, so you'll have to somehow make a list of what to accept and what not with this contexts A, B or C technique. It's the long version of Fede's answer. – asontu Nov 11 '14 at 15:29
  • 1
    Wow... thanks for the link. It made my day. Now I understand @Fede's answer as well :) – Adrian B. Nov 11 '14 at 15:36

3 Answers3

3

PCRE (Perl Compatible Regular Expression)

If you are using PCRE regex then you could use a regex like this:

String(*SKIP)(*FAIL)|ring

Working demo

enter image description here

The idea of this regex is to fail string pattern so it will skip it but will keep ring. Btw, if you want to grab the complete word you could use this regex:

String(*SKIP)(*FAIL)|(\w*ring\w*)

The match information is:

MATCH 1
1.  [14-21] `ringing`
MATCH 2
1.  [64-71] `ringing`

Other engines

On the other hand, if you are not using PCRE you could leverage the discard pattern that's a really nice regex trick by doing this:

String|(\w*ring\w*)

Working demo

enter image description here

In this case what you do is to match what you don't want on the left side of the pattern while you keep what you want on the rightest part using a group. The discard pattern follows this rule:

discard patt1 | discard patt2 | ... | discard pattN | (KEEP THIS PATTERN)

Then you have to access to the regex group \1 or $1 to grab the saved string. For this case is:

MATCH 1
1.  [14-21] `ringing`
MATCH 2
1.  [64-71] `ringing`

Debuggex does a good job displaying graphically this technique:

Regular expression visualization

Federico Piazza
  • 30,085
  • 15
  • 87
  • 123
  • Thanks to @funkwurm's comment, I understand your answer and it's quite nice. However this doesn't help inside a find-and-replace feature of a text editor or an IDE as they look at all the matched groups, therefor it matches also String. – Adrian B. Nov 11 '14 at 15:38
  • Combined with @OnlineCop's comment, this actually works :) Thanks, guys – Adrian B. Nov 11 '14 at 15:43
  • @AdrianB. yes, that is the disadvantage of the discard technique. I focused on programming instead of text editors where this technique is a weakness. – Federico Piazza Nov 11 '14 at 15:44
  • Im getting 4 matches in the demo, 2 for ringing and 2 for String – Eduardo EPF Mar 10 '20 at 20:18
2

Building off of @Fede's answer, use a negative look-ahead:

\b(?!String)\w*ring\w*\b

This will start at a word boundary, ensure that it doesn't find String, and then match ring where it can.

Working example

OnlineCop
  • 4,019
  • 23
  • 35
0
String|\b(\w*?ring\w*)\b

Try this.Grab the capture.See demo.Apply i flag.

http://regex101.com/r/tF5fT5/39

vks
  • 67,027
  • 10
  • 91
  • 124