2

so I have this example string out of a html mail given:

Abholstellenname (Firmenname, Details): Musterfirma GmbH<br>

I'm using the following expression to find the company name, in this case Musterfirma GmbH:

(?<=Abholstellenname \(Firmenname, Details\): ).*

But I need to exclude the <br> tag following the company name. How can I achieve this?

I would not ask here if I haven't read through the tutorials and still didn't get it.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563

2 Answers2

1

You can use

(?<=Abholstellenname \(Firmenname, Details\): ).*?(?=<br>|$)

The main idea is to turn the .* part into a .*?(?=<br>|$) pattern that matches any zero or more chars other than line break chars as few as possible followed with either <br> or end of string.

See the regex demo.

If the spaces can be any whitespace chars, replace the literal spaces in the pattern with \s.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
-1

You would need to escape spaces with \s and escape parenthesis with \( and \)

[^<br>] matches any char other than <, >, b and r. This could work for your <br> but if you have anything after that, it will be captured again.

(?<=Abholstellenname\s\(Firmenname,\sDetails\):\s).*[^<br>]
Clark Ngo
  • 343
  • 4
  • 14
  • These chars are already escaped, see the [original post source](https://stackoverflow.com/revisions/76cf3d1e-9a1c-4c50-89f1-4c4809d1eb51/view-source). – Wiktor Stribiżew Sep 14 '22 at 06:51
  • I missed that part. on the `
    ` tag. I edited my post. Thanks!
    – Clark Ngo Sep 14 '22 at 06:54
  • The `[^
    ]` does not mean "anything but `
    `". It matches any char other than `<`, `>`, `b` and `r`. [Here](https://regex101.com/r/8o41Y9/1) is the test proving this solution does not work.
    – Wiktor Stribiżew Sep 14 '22 at 06:54
  • I removed the `+` sign at the end. https://regexr.com/6tuue why does it work here and not in your link? curious. – Clark Ngo Sep 14 '22 at 07:00
  • It does not work and I [explained why](https://stackoverflow.com/questions/73712533/regex-find-string-between-lookbehind-and-lookahead/73712641#comment130165611_73712600). `Abholstellenname (Firmenname, Details): Musterfirma Tr` would truncate the last `r`. In `Abholstellenname (Firmenname, Details): Musterfirma Tbr`, last `br` won't get consumed. You can't use negated character classes like this for this purpose. – Wiktor Stribiżew Sep 14 '22 at 07:02
  • 2
    No, ***"`[^
    ]` excludes `
    ` tag"*** is a wrong statement. Again, ***`[^
    ]` matches any char other than `<`, `>`, `b` and `r`.***
    – Wiktor Stribiżew Sep 14 '22 at 07:04
  • 1
    The current version does not work for me, it still matches the
    – DerThronprinz Sep 14 '22 at 07:06
  • Clark, in regex, there is no single construct that says "any text but a sequence of chars" other than `~()` in Lucene like regex flavor. Even a [tempered greedy token](https://stackoverflow.com/a/37343088/3832970) means something different, but it can be used to work around the limitation together with other features. In PCRE, the lack of the mentioned construct is by using [`(*SKIP)(*F)` "trick"](https://stackoverflow.com/q/24534782/3832970). – Wiktor Stribiżew Sep 14 '22 at 07:11
  • Thanks! I learned something and I have bookmarked them. You are too awesome @WiktorStribiżew – Clark Ngo Sep 14 '22 at 07:22