Remove any lower case character except specific combinations

Question

I have this string: azjf8ee7Ldoge \n Hmeqze= AZ12D Fs \nsdfz14eZe148r.
I want to match all lower case characters except when it is an e followed by a digit (e\d) or when it is a backslash followed by n (\\n).
Based on the answers I found here:
How to negate specific word in regex?
Match everything except for specified strings
I managed to find a solution: (?!(e\d|\\n))[a-z] which works well, except that it matches the n that comes after a backslash.
Link for a demo
How to exclude matching an n preceded by a backslash?

`re.findall(r'e\d|\\n|([a-z])', text)`? Or are you replacing? Like `re.sub(r'(e\d|\\n)|[a-z]', r'\1', text)` ([demo](https://regex101.com/r/cU68PW/1))? — Wiktor Stribiżew, Nov 18 '19 at 15:13
Like `re.sub(r'(e\d|\\n)|[a-z]', r'\1', text)` ([demo](https://regex101.com/r/cU68PW/1))? — Wiktor Stribiżew, Nov 18 '19 at 15:14
@WiktorStribiżew, thank you, that works. Please post it as an answer :-) — singrium, Nov 18 '19 at 15:17
Sorry, I added another lookaround based solution following my logic to my answer. — Wiktor Stribiżew, Nov 18 '19 at 15:27

Wiktor Stribiżew · Accepted Answer · 2019-11-18T15:25:12.687

To keep any e with a single digit after and \n two-char sequences, and remove any lowercase ASCII letter in other contexts you may use

re.sub(r'(e\d|\\n)|[a-z]', r'\1', text)

See the regex demo

Details

(e\d|\\n) - matches and captures into Group 1 (referred to with \1 placeholder from the replacement pattern) an e and a single digit or a \ and an n char
| - or
[a-z] - a lowercase ASCII letter.

The \1 restores the captured values in the result.

If you want to play with lookarounds you may use

[a-z](?<!e(?=\d))(?<!\\n)
re.sub(r'[a-z](?<!e(?=\d))(?<!\\n)', '', text)

See another regex demo

The [a-z](?<!e(?=\d))(?<!\\n) pattern matches any ASCII lowercase letter ([a-z]) that is not e followed with a digit ((?<!e(?=\d))) and is not n preceded with n ((?<!\\n)).

anubhava · Answer 2 · 2019-11-18T17:15:56.020

2

If you want to avoid matching \n then you may add a negative lookahead assertion in your regex:

(?!e\d|\\n)[a-z](?<!\\n)

Updated RegEx Demo

(?<!\\n) is negative lookbehind assertion that ensures that we don't have \n at previous position after matching [a-z] within your match.

edited Nov 18 '19 at 17:15

answered Nov 18 '19 at 15:16

anubhava

761,203
64
569
643

I think the right answer should be: `(?!(e\d|\\n))[a-z](?<!\\n)` because the answer you proposed still matches **e followed by a digit** – singrium Nov 18 '19 at 15:46
Oh yes, `(?!e\d|\\n)[a-z](?<!\\n)` would be the right one – anubhava Nov 18 '19 at 17:15

The fourth bird · Answer 3 · 2019-11-18T16:51:09.680

1

You could match char a-z and make use of lookarounds:

(?!e\d)[a-z](?<!\\[a-z])

In parts

(?!e\d) Negative lookahead, assert what is on the right is not e followed by a digit
[a-z] Match a char a-z
(?<!\\[a-z]) Negative lookbehind, assert what is on the left is not \ followed by a char a-z

Regex demo

edited Nov 18 '19 at 16:51

answered Nov 18 '19 at 15:18

The fourth bird

154,723
16
55
70

Remove any lower case character except specific combinations

3 Answers3