What does this --> [^]+ <-- RegEx match?

Question

I was doing some RegEx exercises in a website that gives you some text with highlighted sections for you to match with a regular expression.

The following is just a snippet of the given text:

Durazzo</a>, <a href="/wiki/Ladislao_I_di_Napoli" title="Ladislao I di Napoli">Ladislao I di 
Napoli</a> e <a href="/wiki/Giovanna_II_di_Napoli" title="Giovanna II di Napoli">Giovanna II di
 Napoli</a>. L'ultima grande impresa degli angioini napoletani fu la spedizione militare di <a 
href="/wiki/Ladislao_I_di_Napoli" title="Ladislao I di Napoli">Ladislao I di Napoli</a>, il primo 
tentativo di riunificazione politica d'<a href="/wiki/Italia" title="Italia">Italia</a>, agli inizi 
del <a href="/wiki/XV_secolo" title="XV secolo">XV secolo</a>.</p>

        <h1><a href="http://www.repubblica.it/cronaca/2013/12/20/news
/serial_killer_in_fuga_cancellieri_in_parlamento-74089064/" target="_self" title="">Catturati i due 
killer evasi</a></h1>
<h3><span class="editsection">[<a href="/w/index.php?title=Roma&amp;action=edit&amp;section=62" 
title="Modifica la sezione Mobilità urbana">modifica</a>]</span> <span class="mw-headline" 
id="Mobilit.C3.A0_urbana">Mobilità urbana</span></h3>

#l_footer a:hover, #l_footer_extended div.libero a:hover {
<h1><a href="http://temi.repubblica.it/guide-universita-2013-2014/">UniversitÃ  2013-2014</a></h1>
</a>
  al duca mio, e li occhi a lui drizzai.<br/>
</p>

And the goal was to match all the text between the header tags (including the tags themselves) ( ie.: <h?>...<\h?> ).

I know how to achieve the goal, however, when I accidently tried the regex <h[^]+ it seemed to select exactly the text that I needed and I do not understand how or why.

Any insights?

PS. For reference, this is the website and this particular example is 8/12.

Depending on the regex engine, the `[^]` part is either invalid, or means "any character, including newline". you can test it on https://regex101.com/ — Seblor, Nov 07 '19 at 14:25

KyleFairns · Answer 1 · 2019-11-07T14:34:07.397

2

As that's the exclude characters group - ~~I'm assuming that~~ the language will be interpreting it as "exclude none" - or match anything - as there aren't any extra characters in that group.

[^1]+ would match a string of characters so long as there isn't a 1 present, for instance.

Bizarre functionality that you've highlighted there, but it's pretty cool

edited Nov 07 '19 at 14:34

answered Nov 07 '19 at 14:27

KyleFairns

2,947
1
15
35

score 0 · Accepted Answer · answered Nov 07 '19 at 14:24

0

In the javascript implementation of regular expressions, [^] matches ANY character, which makes it equivalent to the dot . .

If the regex options are defined to stop matching at the end of the line, that would make it look like the match is correct.

answered Nov 07 '19 at 14:24

Mr47

2,655
1
19
25

8

_"which makes it equivalent to the dot"_ Not exactly. It's equivalent to the dot only when using the `/s` (single line) flag. It is equivalent to `[\S\s]` regardless though. – 41686d6564 stands w. Palestine Nov 07 '19 at 14:26
@AhmedAbdelhameed Well, the more you know :) Thx for the added information. – Mr47 Nov 07 '19 at 14:28
1

_If the regex options are defined to stop matching at the end of the line, that would make it look like the match is correct._ That answers both the how and why :) – Raccoon Nov 07 '19 at 14:35
*"If the regex options are defined to stop matching at the end of the line"* this is the default scenario, for `.` to match newline characters you'll have to activate single line mode by providing the fairly new and not universally supported `s` flag. See the [documentation](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Guide/Regular_Expressions#Advanced_searching_with_flags_2) and [browser compatibility](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/dotAll#Browser_compatibility) – 3limin4t0r Nov 07 '19 at 15:03

What does this --> [^]+ <-- RegEx match?

2 Answers2