4

I'd like to convert regular expression into glob

I was looking on jakarta oro But i can't find method that suits my needs. That it compiles regular expression and returns its glob equivalent

They are both Type-3 grammars, so in theory it should be possible.

I am unfortunatelly limited by using JDK5.

Stephan
  • 41,764
  • 65
  • 238
  • 329
Rob
  • 708
  • 8
  • 27
  • http://stackoverflow.com/questions/1247772/is-there-an-equivalent-of-java-util-regex-for-glob-type-patterns – sol4me Sep 25 '14 at 15:47
  • 2
    @sol4me That's glob -> regex – Rob Sep 25 '14 at 15:48
  • I don't think that glob has any support for, for example, lookaround. So this is not possible in the general case. If you want details for a specific case you'll have to be more...specific. – Boris the Spider Sep 25 '14 at 15:50
  • I don't know the details of formal grammar theory, but if by "glob" you mean shell patterns where `*` matches any sequence of characters, `?` matches a single character, and `[...]` matches a character in a set, and those are the only wildcards available, then I don't think a regex can generally be converted to a glob. What glob would match the same sequences as the regex `(this|that)file`? – ajb Sep 25 '14 at 15:52
  • @ajb: `{this,that}file`, it is **not** a universal glob syntax though. – nhahtdh Sep 26 '14 at 04:50
  • @nhahtdh `{this,that}file` isn't really a glob syntax. A pattern like `abc*.txt` is a glob that searches all files and tests their names against the pattern. `{this,that}file` doesn't look for files at all. It just causes `thisfile thatfile` to be included in the command line (in a Unix shell) without checking whether they are the names of existing files. So it's not a pattern that anything is matched against. I don't think it's the same thing at all. – ajb Sep 26 '14 at 06:00
  • @BoristheSpider Could you please back up your claim that `So this is not possible in the general case` Or better write it into the answer with an explanation or source? – Rob Sep 26 '14 at 07:22
  • 1
    @Rob I didn't post it as an answer as it is speculation. I do know that it is provable that lookarounds allow matching of patterns that could not be matched without them. I also reckon there is no lookaround in glob. I am fairly certain what you ask for is not possible as glob is a strict subset of regex. – Boris the Spider Sep 26 '14 at 07:31
  • @BoristheSpider Ok, but you're still stating that ` I am fairly certain what you ask for is not possible as glob is a strict subset of regex.` Which indicates that you are fairly certain. It would nice to show me, or just point to why you are. Anyway, you could upvote this question for better visibility, maybe somebody can provide solid explanation why it is or not possible. Because i could not find answer on web after extensive searching and i think it's interesting question. – Rob Sep 26 '14 at 14:47
  • Wasn't my example sufficient to back up Boris' claim? Here's another one: `ab*c\.txt`. I.e. a pattern with `a` followed by zero or one `b` followed by `c.txt`. I don't think there is a glob that matches the same sequences matched by this regex, unless we have totally different understandings of what a "glob" is. – ajb Sep 26 '14 at 20:30
  • "zero or one" in my previous comment should be "zero or more" – ajb Sep 27 '14 at 01:28
  • @ajb I think extglob covers all your counter examples. – Rob Sep 30 '14 at 08:58
  • @Rob Then you should have said "extglob" or "extended glob" in your question. "Glob" is not "extglob", and not all of us are frequent enough bash/ksh users that we would automatically know what you mean. Please consider editing your question. – ajb Sep 30 '14 at 14:54

1 Answers1

0

extglob can match a number of regex constructs (pattern-list is a list of alterations):

extglob           regex
--------------    -----------------


?                 [^/]
*                 [^/]*
.                 \.
**                .
?(pattern-list)   (pattern-list)?
*(pattern-list)   (pattern-list)*
+(pattern-list)   (pattern-list)+
@(pattern-list)   (pattern-list)
!(pattern-list)   (?!pattern-list)

There are some things that regex does that cannot be done in extglob, as far as I know, too:

??????????????    [^abc]
??????????????    \1
??????????????    most look arounds

Assuming all of the constructs in the regex have extglob equivalents, it would be possible to convert it to extglob form. It would be difficult, because regexes are represented by a CFG. And you're using Java, which forces you to use the evil escaped escape \\.

Why not just use a different bash utility that supports regexes? Like this.

Community
  • 1
  • 1
Laurel
  • 5,965
  • 14
  • 31
  • 57