Different behavior of regular expression in Notepad++ and regex101.com

Question

I have a following HTML code

<div class=\"article-element article-element--paragraph\">
    <p class=\"paragraph\">
        <span>Problem dotycz&#x105;cy wycieku danych, kt&#xF3;ry rozpocz&#x105;&#x142; si&#x119; ju&#x17C; w listopadzie 2013 r., trwa&#x142; do po&#x142;owy kwietnia bie&#x17C;&#x105;cego roku i wynika&#x142; z b&#x142;&#x119;du ludzkiego - przekazano. Incydent dotyczy tak&#x17C;e klient&#xF3;w luksusowej marki Lexus, nale&#x17C;&#x105;cej do Toyoty.\n\nSprawa wysz&#x142;a na jaw w momencie, gdy najwi&#x119;kszy na &#x15B;wiecie producent samochod&#xF3;w pod wzgl&#x119;dem liczby sprzeda&#x17C;y naciska na zarz&#x105;dzanie danymi w chmurze, co jest postrzegane jako kluczowe dla wprowadzania w pojazdach nowych funkcji wspieranych przez sztuczn&#x105; inteligencj&#x119; - zwr&#xF3;ci&#x142;a uwag&#x119; agencja Reutera.\n\nPo wykryciu problemu podj&#x119;to kroki w celu zablokowania zewn&#x119;trznego dost&#x119;pu do danych klient&#xF3;w i przeprowadzono dochodzenie we wszystkich chmurach zarz&#x105;dzanych przez japo&#x144;ski koncern - poinformowa&#x142;a Toyota.</span>
    </p>
</div>

My intention is to remove all the characters between and including angle brackets and leave only the text between the span tags.

I'm using this regular expression

\<.*?\>

When I'm testing my case in the regex101.com debugger it looks like it matches what I want properly.

However, when I'm using the same expression in Notepad++ to replace with nothing it behaves differently. It's removing the text and leaving out angle brackets and various charcters inbetween.

What's causing the difference in the regex behavior between Notepad++ and regex101?

Do not escape `<` and `>`, `\<` and `\>` are word boundaries. — Wiktor Stribiżew, May 12 '23 at 20:00

Different behavior of regular expression in Notepad++ and regex101.com

0 Answers0