-1

Imagine I have the following text: This is a text

If I only want to get "This is a text", then the regular expression will be:

(?<=\<tag\>).*(?=\<\/tag\>)

Imagine now that the text is something like:

<tag> <Arial>This is a text<Arial></tag>

What would be now the regular expression that gets "This is a text" and ignores everything else? The word Arial may not be fixed and varies.

Thanks in advance.

1 Answers1

0

It depends to the regex flavour. If your regexps allows backreferences, you can go with:

\<(\w+)\>([^<>]*)\<\/\1\>

and \1 will refer to the content of (\w+).

In this case, the text "This is a text" will be matched by group 2 (see demo).

Notice that in most of flavors you cannot use the + inside lookbekind.

Otherwise, you have to be more broad and use something like:

(?<=\>)[^<>]*(?=\<)

In any case, probably the best solution is to use a DOM parser.

logi-kal
  • 7,107
  • 6
  • 31
  • 43