Possible to limit to scope/range of a lookahead

Question

We can check to see if a digit is in a password, for example, by doing something like:

(?=.*\d)

Or if there's a digit and lowercase with:

(?=.*\d)(?=.*[a-z])

This will basically go on "until the end" to check whether there's a letter in the string.

However, I was wondering if it's possible in some sort of generic way to limit the scope of a lookahead. Here's a basic example which I'm hoping will demonstrate the point:

start_of_string;
middle_of_string;
end_of_string;

I want to use a single regular expression to match against start_of_string + middle_of_string + end_of_string.

Is it possible to use a lookahead/lookbehind in the middle_of_string section WITHOUT KNOWING WHAT COMES BEFORE OR AFTER IT? That is, not knowing the size or contents of the preceding/succeeding string component. And limit the scope of the lookahead to only what is contained in that portion of the string?

Let's take one example:

start_of_string = 'start'
middle_of_string = '123'
end_of_string = 'ABC'

Would it be possible to check the contents of each part but limit it's scope like this?

string = 'start123ABC'
# Check to make sure the first part has a letter, the second part has a number and the third part has a capital


((?=.*[a-z]).*) # limit scope to the first part only!!
((?=.*[0-9]).*) # limit scope to only the second part.
((?=.*[A-Z]).*) # limit scope to only the last part.

In other words, can lookaheads/lookbehinds be "chained" with other components of a regex without it screwing up the entire regex?

UPDATE:

Here would be an example, hopefully this is more helpful to the question:

START_OF_STRING = 'abc'

Does 'x' exist in it? (?=.*x) ==> False

END_OF_STRING = 'cdxoy'

Does 'y' exist in it? (?=.*y) ==> True

FULL_STRING = START_OF_STRING + END_OF_STRING
'abcdxoy'

Is it possible to chain the two regexes together in any sort of way to only wok on its 'substring' component? For example, now (?=.*x) in the first part of the string would return True, but it should not.

`((?=.*x)(?=.*y)).*`

I think the short answer to this is "No, it's not possible.", but am looking to hear from someone who understands this to tell why it is or isn't.

Replace the dot with a character class that excludes the underscore. — Casimir et Hippolyte, Nov 07 '19 at 21:30
I think you could also do that without a lookahead `something_[^_]+_something` using a negated character class — The fourth bird, Nov 07 '19 at 21:33
@Thefourthbird could you please see the updated question? I'm tried to clarify the "scoping" I'm trying to accomplish. — samuelbrody1249, Nov 07 '19 at 21:38
Still no idea what you mean. Could you provide a real life problem? — Wiktor Stribiżew, Nov 07 '19 at 21:41
Perhaps you could make use of a negated character class if you know that you have for example 2 underscores and you want to make sure there is an `s` between the first and the second. https://regex101.com/r/bKud2v/1 — The fourth bird, Nov 07 '19 at 21:53
Or this page about contrast might be helpful https://www.rexegg.com/regex-style.html#contrast — The fourth bird, Nov 07 '19 at 22:02
How do you define "middle_of_string" if you don't know the size of "start_of_string"? — Nick, Nov 07 '19 at 22:19
@Nick `full_string = a + b + c`, where `b` is `middle_of_string` — samuelbrody1249, Nov 07 '19 at 22:42
@Nick `a = 'hello' b = 'arrow' c = 'xyz' full_string = a + b + c` ? — samuelbrody1249, Nov 07 '19 at 22:46
My point is that if I look at the string `helloarrowxyz` I have no way of knowing which is `start`, which is `middle` and which is `end`. So that makes it impossible to "scope" a regex to only look at a specific part of the string. But if you have the string in separate parts to begin with, why not apply the conditions to each string individually? — Nick, Nov 07 '19 at 22:50
@Nick exactly, that's the essence of the question. Is it possible to chain multiple individual regexes into one where a positive lookahead is used in one of the individual regex parts. — samuelbrody1249, Nov 07 '19 at 22:52
First of all you do not need that ever. Then, `^.{0,}x.{0,}y` could work, but if you do not know those, it has no solution. There is no solution because there is no such a problem you describe. — Wiktor Stribiżew, Nov 07 '19 at 23:13
@samuelbrody1249 I really highly recommend that you don't do this if you're thinking about implementing it for password validation. See [Reference - Password Validation](https://stackoverflow.com/q/48345922/3600709) — ctwheels, Nov 08 '19 at 04:42
@ctwheels thanks, but it's unrelated to a password. Really I'm interested in the theory of if it's possible to chain regexes together, and what limitations there might be with that. — samuelbrody1249, Nov 08 '19 at 04:45
@samuelbrody1249 how are the substrings determined? The easiest way to do this is by applying a regex to the substring directly. Modifying the regex is also possible but a lot of unnecessary work. — ctwheels, Nov 08 '19 at 04:48

samuelbrody1249 · Answer 1 · 2019-11-08T00:19:30.890

In .NET and javascript you could use a positive lookahead at the start of your string component and a negative lookbehind at the end of it to "constrain" the match. Example:

.*(?=.*arrow)(?<middle>.*)(?<=.*arrow).*
helloarrowxyz
{'middle': 'arrow'}

If in pcre, python, or other you would need to either have a fixed width lookahead to constraint it from going too far forward, such as what Wiktor Stribiżew says above:

.*(?=.{0,5}arrow)(?<middle>.{0,5}).*

Otherwise, it wouldn't be possible to do without either a fixed-width lookahead or a variable width look-behind.

Possible to limit to scope/range of a lookahead

1 Answers1