1

I was doing some RegEx exercises in a website that gives you some text with highlighted sections for you to match with a regular expression.

The following is just a snippet of the given text:

Durazzo</a>, <a href="/wiki/Ladislao_I_di_Napoli" title="Ladislao I di Napoli">Ladislao I di 
Napoli</a> e <a href="/wiki/Giovanna_II_di_Napoli" title="Giovanna II di Napoli">Giovanna II di
 Napoli</a>. L'ultima grande impresa degli angioini napoletani fu la spedizione militare di <a 
href="/wiki/Ladislao_I_di_Napoli" title="Ladislao I di Napoli">Ladislao I di Napoli</a>, il primo 
tentativo di riunificazione politica d'<a href="/wiki/Italia" title="Italia">Italia</a>, agli inizi 
del <a href="/wiki/XV_secolo" title="XV secolo">XV secolo</a>.</p>

        <h1><a href="http://www.repubblica.it/cronaca/2013/12/20/news
/serial_killer_in_fuga_cancellieri_in_parlamento-74089064/" target="_self" title="">Catturati i due 
killer evasi</a></h1>
<h3><span class="editsection">[<a href="/w/index.php?title=Roma&amp;action=edit&amp;section=62" 
title="Modifica la sezione Mobilità urbana">modifica</a>]</span> <span class="mw-headline" 
id="Mobilit.C3.A0_urbana">Mobilità urbana</span></h3>

#l_footer a:hover, #l_footer_extended div.libero a:hover {
<h1><a href="http://temi.repubblica.it/guide-universita-2013-2014/">Università 2013-2014</a></h1>
</a>
  al duca mio, e li occhi a lui drizzai.<br/>
</p>

And the goal was to match all the text between the header tags (including the tags themselves) ( ie.: <h?>...<\h?> ).

I know how to achieve the goal, however, when I accidently tried the regex <h[^]+ it seemed to select exactly the text that I needed and I do not understand how or why.

Any insights?

PS. For reference, this is the website and this particular example is 8/12.

Raccoon
  • 63
  • 6
  • 4
    Depending on the regex engine, the `[^]` part is either invalid, or means "any character, including newline". you can test it on https://regex101.com/ – Seblor Nov 07 '19 at 14:25

2 Answers2

2

As that's the exclude characters group - I'm assuming that the language will be interpreting it as "exclude none" - or match anything - as there aren't any extra characters in that group.

[^1]+ would match a string of characters so long as there isn't a 1 present, for instance.

Bizarre functionality that you've highlighted there, but it's pretty cool

KyleFairns
  • 2,947
  • 1
  • 15
  • 35
0

In the javascript implementation of regular expressions, [^] matches ANY character, which makes it equivalent to the dot . .

If the regex options are defined to stop matching at the end of the line, that would make it look like the match is correct.

Mr47
  • 2,655
  • 1
  • 19
  • 25
  • 8
    _"which makes it equivalent to the dot"_ Not exactly. It's equivalent to the dot only when using the `/s` (single line) flag. It is equivalent to `[\S\s]` regardless though. – 41686d6564 stands w. Palestine Nov 07 '19 at 14:26
  • @AhmedAbdelhameed Well, the more you know :) Thx for the added information. – Mr47 Nov 07 '19 at 14:28
  • 1
    _If the regex options are defined to stop matching at the end of the line, that would make it look like the match is correct._ That answers both the how and why :) – Raccoon Nov 07 '19 at 14:35
  • *"If the regex options are defined to stop matching at the end of the line"* this is the default scenario, for `.` to match newline characters you'll have to activate single line mode by providing the fairly new and not universally supported `s` flag. See the [documentation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Advanced_searching_with_flags_2) and [browser compatibility](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll#Browser_compatibility) – 3limin4t0r Nov 07 '19 at 15:03