RegEx find string between lookbehind and lookahead

Question

so I have this example string out of a html mail given:

Abholstellenname (Firmenname, Details): Musterfirma GmbH<br>

I'm using the following expression to find the company name, in this case Musterfirma GmbH:

(?<=Abholstellenname \(Firmenname, Details\): ).*

But I need to exclude the   tag following the company name. How can I achieve this?

I would not ask here if I haven't read through the tutorials and still didn't get it.

score 1 · Accepted Answer · answered Sep 14 '22 at 06:54

You can use

(?<=Abholstellenname \(Firmenname, Details\): ).*?(?=<br>|$)

The main idea is to turn the .* part into a .*?(?= |$) pattern that matches any zero or more chars other than line break chars as few as possible followed with either   or end of string.

See the regex demo.

If the spaces can be any whitespace chars, replace the literal spaces in the pattern with \s.

Clark Ngo · Answer 2 · 2022-09-14T07:07:16.940

-1

You would need to escape spaces with \s and escape parenthesis with $ and $

[^ ] matches any char other than <, >, b and r. This could work for your   but if you have anything after that, it will be captured again.

(?<=Abholstellenname\s\(Firmenname,\sDetails\):\s).*[^<br>]

edited Sep 14 '22 at 07:07

answered Sep 14 '22 at 06:50

Clark Ngo

343
4
14

These chars are already escaped, see the [original post source](https://stackoverflow.com/revisions/76cf3d1e-9a1c-4c50-89f1-4c4809d1eb51/view-source). – Wiktor Stribiżew Sep 14 '22 at 06:51
I missed that part. on the `
` tag. I edited my post. Thanks! – Clark Ngo Sep 14 '22 at 06:54
The `[^
]` does not mean "anything but `
`". It matches any char other than `<`, `>`, `b` and `r`. [Here](https://regex101.com/r/8o41Y9/1) is the test proving this solution does not work. – Wiktor Stribiżew Sep 14 '22 at 06:54
I removed the `+` sign at the end. https://regexr.com/6tuue why does it work here and not in your link? curious. – Clark Ngo Sep 14 '22 at 07:00
It does not work and I [explained why](https://stackoverflow.com/questions/73712533/regex-find-string-between-lookbehind-and-lookahead/73712641#comment130165611_73712600). `Abholstellenname (Firmenname, Details): Musterfirma Tr` would truncate the last `r`. In `Abholstellenname (Firmenname, Details): Musterfirma Tbr`, last `br` won't get consumed. You can't use negated character classes like this for this purpose. – Wiktor Stribiżew Sep 14 '22 at 07:02
2

No, ***"`[^
]` excludes `
` tag"*** is a wrong statement. Again, ***`[^
]` matches any char other than `<`, `>`, `b` and `r`.*** – Wiktor Stribiżew Sep 14 '22 at 07:04
1

The current version does not work for me, it still matches the
– DerThronprinz Sep 14 '22 at 07:06
Clark, in regex, there is no single construct that says "any text but a sequence of chars" other than `~()` in Lucene like regex flavor. Even a [tempered greedy token](https://stackoverflow.com/a/37343088/3832970) means something different, but it can be used to work around the limitation together with other features. In PCRE, the lack of the mentioned construct is by using [`(*SKIP)(*F)` "trick"](https://stackoverflow.com/q/24534782/3832970). – Wiktor Stribiżew Sep 14 '22 at 07:11
Thanks! I learned something and I have bookmarked them. You are too awesome @WiktorStribiżew – Clark Ngo Sep 14 '22 at 07:22

RegEx find string between lookbehind and lookahead

2 Answers2