1

I'd like, ideally, not having to resort to capturing groups but rather, assert that the string starts/ends with some sequence and directly use the value matched by the regex.

Input:

    map_Ks     ./CarbonFiber_T.tga

Input definition:

  • start of line
  • maybe some spaces
  • the string map_Ks (this is the class field I want to assign value to)
  • one or more spaces
  • a valid file path, anything but 0x00-0x1F, 0x7C (this is the value I want to assign to the field)
  • maybe some spaces
  • end of line

Attempt 1: it works but result is in a captured group

(?:^\s+map_K.\s+)([^\x00-\x1F\x7C]+)$

  map_Ks     ./CarbonFiber_T.tga
./CarbonFiber_T.tga

Attempt 2: it works, there are no groups but the match is the entire line (ideal usage)

(?=^\s+map_K.\s+)[^\x00-\x1F\x7C]+$

  map_Ks     ./CarbonFiber_T.tga

Question:

Is this possible at all or am I asking the regex engine too much and simply should use capture groups?

aybe
  • 15,516
  • 9
  • 57
  • 105
  • 1
    You want a lookbehind, `(?<=^\s+map_K.\s+)(?=\S)[^\x00-\x1F\x7C]+$` (see [.NET regex demo](http://regexstorm.net/tester?p=%28%3f%3c%3d%5e%5cs%2bmap_K.%5cs%2b%29%28%3f%3d%5cS%29%5b%5e%5cx00-%5cx1F%5cx7C%5d%2b%24&i=++map_Ks+++++.%2fCarbonFiber_T.tga)). Do not use regex101 to test .NET regex validity. – Wiktor Stribiżew Nov 22 '21 at 09:41
  • It almost works except that it captures leading and trailing whitespace in the path, should I trim the resulting value you mean? `[start of match] ./CarbonFiber_T.tga [end of match]` – aybe Nov 22 '21 at 09:43
  • 1
    What exactly do you want to achieve? Do you just need to verify that the entire line matches your requirement? Then whats wrong with attempt 2? Do you need any specific part of the line for further processing? Then you need either capture groups or a lookbehind – derpirscher Nov 22 '21 at 09:45
  • @derpirscher Updated, I'd like to capture the file path for a line starting with `map_K.` – aybe Nov 22 '21 at 09:47
  • @derpirscher You mean attempt #1, right ? Because this one actually works though it uses groups. – aybe Nov 22 '21 at 09:49
  • 1
    No I meant attempt #2 because as you said it works (ie verfies the correctness of the line) but matches the whole line. I wrote this comment before you made clear, that you actually want to extract the path ... If you need to extract the path only, I refer to @WiktorStribiżew 's answer with a lookbehind or you use attempt #1 with capturing groups ... – derpirscher Nov 22 '21 at 09:59
  • *"anything but 0x00-0x1F, 0x7C"* : do you mean anything in the ASCII range or anything among all the UNICODE code points? Do you really need to check the path syntax since you already know that it is a path? – Casimir et Hippolyte Nov 22 '21 at 11:26
  • Well, all of your comments made me think I'm betting too much on regex, I decided another approach: if string starts with map_Ks then grab anything after it and its trailing spaces. That said, it's the job of the reader that will try load this file, if that path is valid somehow, as simple as that. I realized that I may have to further reinterpret it since it may be a relative one, so I thought why not delegate more to the reader and keep a simple expression. So far, effective, simpler, and gives more control in the end. – aybe Nov 22 '21 at 19:41

1 Answers1

1

You need to replace the lookahead with a lookbehind and require the first char of the consumed pattern to be a non-whitespace char.

You can use

(?<=^\s+map_K.\s+)(?=\S)[^\x00-\x1F\x7C]*(?<=\S)(?=\s*$)
(?<=^\s+map_K.\s+)[^\x00-\x1F\x7C\s](?:[^\x00-\x1F\x7C]*[^\x00-\x1F\x7C\s])?(?=\s*$)

See the regex demo (or this regex demo). Details:

  • (?<=^\s+map_K.\s+) - a positive lookbehind that matches a location that is immediately preceded with start of string, one or more whitespaces, map_K, any one char other than LF char, one or more whitespaces
  • (?=\S) - a positive lookahead that requires the next char to be a non-whitespace char
  • [^\x00-\x1F\x7C]+ - one or more chars other than ASCII control chars
  • (?<=\S) - the previous char must be a non-whitespace char
  • (?=\s*$) - a positive lookahead requiring zero or more whitespaces at the end of string immediately on the right.

The [^\x00-\x1F\x7C\s](?:[^\x00-\x1F\x7C]*[^\x00-\x1F\x7C\s])? regex part matches one char that is not a whitespace and not an ASCII control char and then an optional sequence of any zero or more chars other than ASCII control chars and then a single char that is not a whitespace and not an ASCII control char.

Just in case you want to adjust the file path regex part, please refer to What characters are forbidden in Windows and Linux directory names?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • This is nearly perfect except that it captures trailing space int the path, I've tried to change it to `(?<=^\s+map_K.\s+)(?=\S)[^\x00-\x1F\x7C]+(?=\s*?)$` but that failed. – aybe Nov 22 '21 at 09:57
  • As for the valid patch chars, you're definitely right, I should upgrade that expression! – aybe Nov 22 '21 at 09:58
  • Thanks for your help Wiktor, you nailed it once again! Still need to digest it to fully understand. Going to upgrade it with your valid chars suggestion and hopefully succeed in doing so! – aybe Nov 22 '21 at 10:05
  • 1
    @aybe I am not sure if it is for Windows or Linux, etc. Here is [a possible update](http://regexstorm.net/tester?p=%28%3f%3c%3d%5e%5cs%2bmap_K.%5cs%2b%29%28%3f%3d%5cS%29%28%3f%3a%5b%5e%5cx00-%5cx1F%5cx7C%3c%3e%3a%22%2f%5c%5c%7c%3f*%5d%2b%2f%29*%5b%5e%5cx00-%5cx1F%5cx7C%3c%3e%3a%22%2f%5c%5c%7c%3f*%5d%2b%28%3f%3c%3d%5cS%29%28%3f%3d%5cs*%24%29&i=++map_Ks+++++.%2fCarbonFiber_T.tga+++). – Wiktor Stribiżew Nov 22 '21 at 10:07