5

I have the regex string and I want to classify, is this regex has fixed prefix or no.

For example:

abcdef.*g[0abc]{0,5}hi has prefix abcdef

]1234vac.*12345 has prefix ]1234vac

(abc)+123 has prefix abc

but

[A-z]+12345 doesn't has fixed prefix (it starts from unknown number of symbols from set A-z)

Am I truly understand that this problem will not be solved in a general form?

dgabriel
  • 267
  • 2
  • 15
  • Do you want to get prefix from those strings ? – Shafizadeh Mar 28 '16 at 13:02
  • and the prefix is whatever that ends with '.'? – Divisadero Mar 28 '16 at 13:04
  • 1
    @Divisadero, no, i think that prefix is whatever that ends in any non-excaped regex special symbol. `12345[abcde]+12345` the prefix is 12345, but ends with `[`. maybe even strongly (this example show interesting case): `123?456` the prefix is `12`, but `123\?456` has prefix `123?456` – dgabriel Mar 28 '16 at 13:07
  • ... I'd say you *have* found a general solution. The prefix is 'everything fixed'. – Jongware Mar 28 '16 at 13:07
  • I'm not exactly sure if I'm catching up or not but in case `[a-z]+`, it's guaranteed to have atleast one character from class. But in case `*` presence is not guaranteed. – Saleem Mar 28 '16 at 13:13
  • @Saleem, of course, but i should to know this prefix (as constant string). In case of `[a-z]+` I'll know only some general features of this string. – dgabriel Mar 28 '16 at 13:15
  • complicated question! well in that case, you'll have walk through expression tree and see if at any place a group or class is being repeated or have some optional constructs. i.e. `+, *, ?` etc. It's sort of writing your own regex parser. – Saleem Mar 28 '16 at 13:24
  • @Saleem, yes, I think about it. But it's not simple code and had a lot of special cases... And I thought that somebody could solve same problem. – dgabriel Mar 28 '16 at 13:27
  • Sounds like compiler's thing. – Til Mar 28 '16 at 14:21
  • **Warning:** The range `[A-z]` is not the same as `[A-Za-z]`. If that wasn't just a typo, you should have a look at [this question](http://stackoverflow.com/q/4923380/20938). – Alan Moore Mar 28 '16 at 15:02
  • @AlanMoore, yes, of course I understand it, but it's not an important detail regarding this issue. – dgabriel Mar 28 '16 at 15:05

1 Answers1

1

Try this RegEx:

^(
  (                     # GENERAL before . (Dot)
    (?!\w+\?)               # DO NOT MATCH if contains ?
    [\w\]\)]+               # Word, ] or ) characters 1 or more times
  )|
  (?:\((\w+)\))|        # Words in between BRACKETS ()
  (                     # BEFORE . (Dot) with ?, * or +
      [\w\]\)]+             # Select Characters
      (?![?*+])             # DO NOT select last character if there is ?, * or + after it
  )
)

Live Demo on Regex101

Tell me any other examples that do not work and I'll change this. I have however tested on all the examples in your question, and comments

Also, how is it even possible to come up with a question this complicated! ;)

Kaspar Lee
  • 5,446
  • 4
  • 31
  • 54
  • Great job! Your solving doesn't cover all cases, but that it looks like what I want. – dgabriel Mar 28 '16 at 14:42
  • @DenisGavrus If you tell me what it does not cover, I can change it. Also, if it answered your question, would you mind accepting it? – Kaspar Lee Mar 28 '16 at 14:48
  • I think that it not chances to solve this problem in general using regex, because it has a lot of cases, escaped special characters, and other (It would be perfect to get prefix `http:\/\/stackoverflow\.com` from regex `http:\/\/stackoverflow\.com\/.*?\/[\d]+` but I understand that it too almost impossible by regex (my task is not for parsing url, it's only example)) Your solving, for example, take prefix `abc` from regex `(abc)*`, but it's not prefix, cause * - {0, } repeats, too for `(abc)?`. But thanks anyway, I'll accept your answer later, if i don't get any others answers – dgabriel Mar 28 '16 at 14:55
  • @DenisGavrus It *is* very hard to do with RegEx, thanks! – Kaspar Lee Mar 28 '16 at 16:21