-1

I need a regular expression that selects the element that does not contain @href

<test  abc="xyz_CHCFGRc/abc_CHmnop" href="sdddzus.xml">text</test>
<test  abc="abc_abc>text23</test>
<test  abc="123_ABCc/abc_CHmnoph">text42</test>

Regular expression which I wrote :

<test\s+abc.[^href]*>.*

the problem with this expression is that href is considered as a seperate letter and if the abc contains any of these letters it doesn't get selected.

The result should be:

<test  abc="abc_abc>text23</test>
<test  abc="123_ABCc/abc_CHmnoph">text42</test>

but in my case the result is

<test  abc="abc_abc>text23</test> 

Thanks in advance

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
Shil
  • 211
  • 1
  • 3
  • 11
  • You should probably be using an XML parser, rather than regex. – VLAZ Oct 09 '19 at 10:44
  • Not probably - definitely. And it would be trivial in XPath/XSLT to select `test[not(@href)]`. Not sure why this question is tagged as `xslt` if regex is the expected solution. – michael.hor257k Oct 09 '19 at 11:02

2 Answers2

0

First of all you should use XML parser for that: Why is it such a bad idea to parse XML with regex?. But if you have to use regular expresisons, here's the solution:

Try <test(?!.+href).+

Exaplanation:

<test - match <test literally

(?!.+href) - negative lookahead: assert what follows is not: .+ - one or more of any characters and href, i.e. assert what follows does not contain href word

.+ - match noe or more of any characters

Demo

Your idea about negated character class is wrong: [^href] negates h, r, e and f separately, it does not negate href as a word.

Michał Turczyn
  • 32,028
  • 14
  • 47
  • 69
0

Here is the regex to match a full line not containing the word "href":

^((?!href).)*$
MiK
  • 918
  • 4
  • 16