Combine case sensitive regex and case insensitive regex into one

Question

I have multiple filters for files (I'm using python). Some of them are glob filters some of them are regular expressions. I have both case sensitive and case insensitive globs and regexes. I can transform the glob into a regular expression with translate.

I can combine the case sensitive regular expressions into one big regular expression. Let's call it R_sensitive.

I can combine the case insensitive regular expressions into one big regular expression (case insensitive). Let's call it R_insensitive.

Is there a way to combine R_insensitive and R_sensitive into one regular expression? The expression would be (of course) case sensitive?

Thanks,

Iulian

NOTE: The way I combine expressions is the following:

Having R1,R2,R3 regexes I make R = (R1)|(R2)|(R3).

EXAMPLE:

I'm searching for "*.txt" (insensitive glob). But I have another glob that is like this: "*abc*" (case sensitive). How to combine (from programming) the 2 regex resulted from "fnmatch.translate" when one is case insensitive while the other is case sensitive?

Would it not be redundant? Or do the expressions have different function? Perhaps an actual example would clarify your question... — Floris, Oct 31 '13 at 22:02
I think what you want is to use inline modifiers. In PCRE you could do `(?i)` to set the `i` modifier and then use `(?-i)` to remove it. See this demo [`(?i)lol(?-i)aaa`](http://regex101.com/r/iB6fU9). Unfortunately, that's *not* supported by python. Edit: to set the modifier that's supported `(?i)` but to remove it, you can't. Also as a reminder, the `i` modifier is to match case insensitive. — HamZa, Oct 31 '13 at 22:03
@HamZa: That question is about Perl, PHP, and .NET, which doesn't help for Python (especially since the answer is different—IIRC, perl applies flags to the rest of the the expression, Python to the entire expression, and .NET to the closest enclosing group). — abarnert, Oct 31 '13 at 22:22

score 2 · Answer 1 · answered Oct 31 '13 at 22:46

Unfortunately, the regex ability you describe is either ordinal modifiers or a modifier span. Python does not support either, though here are what they would look like:

Ordinal Modifiers: (?i)case_insensitive_match(?-i)case_sensitive_match

Modifier Spans: (?i:case_insensitive_match)(?-i:case_sensitive_match)

In Python, they both fail to parse in re. The closest thing you could do (for simple or small matches) would be letter groups:

[Cc][Aa][Ss][Ee]_[Ii][Nn][Ss][Ee][Nn][Ss][Ii][Tt][Ii][Vv][Ee]_[Mm][Aa][Tt][Cc][Hh]case_sensitive_match

Obviously, this approach would be best for something where the insensitive portion is very brief, so I'm afraid it wouldn't be the best choice for you.

abarnert · Answer 2 · 2013-10-31T22:38:28.873

What you need is a way to convert a case-insensitive-flagged regexp into a regexp that works equivalent without the flag.

To do this fully generally is going to be a nightmare.

To do this just for fnmatch results is a whole lot easier.

If you need to handle full Unicode case rules, it will still be very hard.

If you only need to handle making sure each character c also matches c.upper() and c.lower(), it's very easy.

I'm only going to explain the easy case, because it's probably what you want, given your examples, and it's easy. :)

Some modules in the Python standard library are meant to serve as sample code as well as working implementations; these modules' docs start with a link directly to their source code. And fnmatch has such a link.

If you understand regexp syntax, and glob syntax, and look at the source to the translate function, it should be pretty easy to write your own translatenocase function.

Basically: In the inner else clause for building character classes, iterate over the characters, and for each character, if c.upper() != c.lower(), append both instead of c. Then, in the outer else clause for non-special characters, if c.upper() != c.lower(), append a two-character character class consisting of those two characters.

So, translatenocase('*.txt') will return something like r'.*\.[tT][xX][tT]' instead of something like r'.*\.txt'. But normal translate('*abc*') will of course return the usual r'.*abc.*'. And you can combine these just by using an alternation, as you apparently already know how to do.

Combine case sensitive regex and case insensitive regex into one

2 Answers2

Linked