In answer your original question, I would have offered this regex:
\b[A-Z][a-z]+\b(?=[^{}]*})
The last part is a positive lookahead; it notes the current match position, tries to match the enclosed subexpression, then returns the match position to where it started. In this case, it starts at the end of the word that was just matched and gobbles up as many characters it can as long as they're not {
or }
. If the next character after that is }
, it means the word is inside a pair of braces, so the lookahead succeeds. If the next character is {
, or if there's no next character because it's at the end of the string, the lookahead fails and the regex engine moves on to try the next word.
Unfortunately, that won't work because (as you mentioned in a comment) the braces may be nested. Matching any kind of nested or recursive structure is fundamentally incompatible with the way regexes work. Many regex flavors offer that capability anyway, but they tend to go about it in wildly different ways, and it's always ugly. Here's how I would do this in C#, using Balanced Groups:
Regex r = new Regex(@"
\b[A-Z][a-z]+\b
(?!
(?>
[^{}]+
|
{ (?<Open>)
|
} (?<-Open>)
)*
$
(?(Open)(?!))
)", RegexOptions.ExplicitCapture | RegexOptions.IgnorePatternWhitespace);
string s = "testa Testb { Test1 Testc testd 1Test } Teste { Testf {testg Testh} testi } Testj";
foreach (Match m in r.Matches(s))
{
Console.WriteLine(m.Value);
}
output:
Testc
Testf
Testh
I'm still using a lookahead, but this time I'm using the group named Open
as a counter to keep track of the number of opening braces relative to the number of closing braces. If the word currently under consideration is not enclosed in braces, then by the time the lookahead reaches the end of the string ($
), the value of Open
will be zero. Otherwise, whether it's positive or negative, the conditional construct - (?(Open)(?!))
- will interpret it as "true" and try to try to match (?!)
. That's a negative lookahead for nothing, which is guaranteed to fail; it's always possible to match nothing.
Nested or not, there's no need to use a lookbehind; a lookahead is sufficient. Most flavors place such severe restrictions on lookbehinds that nobody would even think to try using them for a job like this. .NET has no such restrictions, so you could do this in a lookbehind, but it wouldn't make much sense. Why do all that work when the other conditions--uppercase first letter, no digits, etc--are so much cheaper to test?