0

We can check to see if a digit is in a password, for example, by doing something like:

(?=.*\d)

Or if there's a digit and lowercase with:

(?=.*\d)(?=.*[a-z])

This will basically go on "until the end" to check whether there's a letter in the string.

However, I was wondering if it's possible in some sort of generic way to limit the scope of a lookahead. Here's a basic example which I'm hoping will demonstrate the point:

start_of_string;
middle_of_string;
end_of_string;

I want to use a single regular expression to match against start_of_string + middle_of_string + end_of_string.

Is it possible to use a lookahead/lookbehind in the middle_of_string section WITHOUT KNOWING WHAT COMES BEFORE OR AFTER IT? That is, not knowing the size or contents of the preceding/succeeding string component. And limit the scope of the lookahead to only what is contained in that portion of the string?

Let's take one example:

start_of_string = 'start'
middle_of_string = '123'
end_of_string = 'ABC' 

Would it be possible to check the contents of each part but limit it's scope like this?

string = 'start123ABC'
# Check to make sure the first part has a letter, the second part has a number and the third part has a capital


((?=.*[a-z]).*) # limit scope to the first part only!!
((?=.*[0-9]).*) # limit scope to only the second part.
((?=.*[A-Z]).*) # limit scope to only the last part.

In other words, can lookaheads/lookbehinds be "chained" with other components of a regex without it screwing up the entire regex?

UPDATE:

Here would be an example, hopefully this is more helpful to the question:

START_OF_STRING = 'abc'

Does 'x' exist in it? (?=.*x) ==> False

END_OF_STRING = 'cdxoy'

Does 'y' exist in it? (?=.*y) ==> True

FULL_STRING = START_OF_STRING + END_OF_STRING
'abcdxoy'

Is it possible to chain the two regexes together in any sort of way to only wok on its 'substring' component? For example, now (?=.*x) in the first part of the string would return True, but it should not.

`((?=.*x)(?=.*y)).*`

I think the short answer to this is "No, it's not possible.", but am looking to hear from someone who understands this to tell why it is or isn't.

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58
  • Replace the dot with a character class that excludes the underscore. – Casimir et Hippolyte Nov 07 '19 at 21:30
  • I think you could also do that without a lookahead `something_[^_]+_something` using a negated character class – The fourth bird Nov 07 '19 at 21:33
  • @Thefourthbird could you please see the updated question? I'm tried to clarify the "scoping" I'm trying to accomplish. – samuelbrody1249 Nov 07 '19 at 21:38
  • Still no idea what you mean. Could you provide a real life problem? – Wiktor Stribiżew Nov 07 '19 at 21:41
  • Perhaps you could make use of a negated character class if you know that you have for example 2 underscores and you want to make sure there is an `s` between the first and the second. https://regex101.com/r/bKud2v/1 – The fourth bird Nov 07 '19 at 21:53
  • Or this page about contrast might be helpful https://www.rexegg.com/regex-style.html#contrast – The fourth bird Nov 07 '19 at 22:02
  • How do you define "middle_of_string" if you don't know the size of "start_of_string"? – Nick Nov 07 '19 at 22:19
  • @Nick `full_string = a + b + c`, where `b` is `middle_of_string` – samuelbrody1249 Nov 07 '19 at 22:42
  • But what defines where `a` ends and `b` starts? – Nick Nov 07 '19 at 22:44
  • @Nick `a = 'hello' b = 'arrow' c = 'xyz' full_string = a + b + c` ? – samuelbrody1249 Nov 07 '19 at 22:46
  • My point is that if I look at the string `helloarrowxyz` I have no way of knowing which is `start`, which is `middle` and which is `end`. So that makes it impossible to "scope" a regex to only look at a specific part of the string. But if you have the string in separate parts to begin with, why not apply the conditions to each string individually? – Nick Nov 07 '19 at 22:50
  • @Nick exactly, that's the essence of the question. Is it possible to chain multiple individual regexes into one where a positive lookahead is used in one of the individual regex parts. – samuelbrody1249 Nov 07 '19 at 22:52
  • First of all you do not need that ever. Then, `^.{0,}x.{0,}y` could work, but if you do not know those, it has no solution. There is no solution because there is no such a problem you describe. – Wiktor Stribiżew Nov 07 '19 at 23:13
  • @samuelbrody1249 I really highly recommend that you don't do this if you're thinking about implementing it for password validation. See [Reference - Password Validation](https://stackoverflow.com/q/48345922/3600709) – ctwheels Nov 08 '19 at 04:42
  • @ctwheels thanks, but it's unrelated to a password. Really I'm interested in the theory of if it's possible to chain regexes together, and what limitations there might be with that. – samuelbrody1249 Nov 08 '19 at 04:45
  • @samuelbrody1249 how are the substrings determined? The easiest way to do this is by applying a regex to the substring directly. Modifying the regex is also possible but a lot of unnecessary work. – ctwheels Nov 08 '19 at 04:48

1 Answers1

1

In .NET and javascript you could use a positive lookahead at the start of your string component and a negative lookbehind at the end of it to "constrain" the match. Example:

.*(?=.*arrow)(?<middle>.*)(?<=.*arrow).*
helloarrowxyz
{'middle': 'arrow'}

If in pcre, python, or other you would need to either have a fixed width lookahead to constraint it from going too far forward, such as what Wiktor Stribiżew says above:

.*(?=.{0,5}arrow)(?<middle>.{0,5}).* 

Otherwise, it wouldn't be possible to do without either a fixed-width lookahead or a variable width look-behind.

samuelbrody1249
  • 4,379
  • 1
  • 15
  • 58