0

I want to match strings with the following template:

<el key1="val1" key2="val2" />

I can match them with a regex such as:

^(<el\s+)(?=.*key1=".*".*)(?=.*key2=".*".*)(.*\/>$)

The problem is that

<el key1="val1" key2="val2" aaa />
<el key1="val1" aa key2="val2" />
<el aaa key1="val1" key2="val2" />

are also matches. I want to find ^<el\s+ exactly at the beginning, (\/>)$ at the end and the two \s+keyn=".*"\s+ somewhere in between.

EDIT: (based on comments and replies) Keys can be title, uri, text. The issue with the answers so far is the keys can be in any order, so:

<el key1="val1" key2="val2" />
<el key2="val2" key1="val1" />

are both valid.

cahenepudi
  • 31
  • 3

2 Answers2

0

Here is a regex that could work

^(<el)\s+(\S+=".*")\s+(\S+=".*")\s+(\/>$)
Kannappan Sirchabesan
  • 1,353
  • 11
  • 21
0

You should really use an XML parser of your language, because the things that you don't want to match are all invalid XML tags, so they can be ruled out easily. You just need to check for the tag name being el, and having only 2 attributes key1 and key2.

If you insist on regex, here is one that should work:

^<el\s+key1=".*?"\s+key2=".*?"\s+\/>$

The main difference between this and your attempt is that your attempt uses too many lookaheads. Why are you using lookaheads? Normal matching should be fine. You lookahead for key1=".*?" and then match .* greedily, which seems quite redundant. Therefore, I deleted the lookaheads and .*s. And it only matches the string you want.

If you also want to match strings where key2 appears before key1, the regex becomes much longer:

^<el\s+(?:key1=".*?"\s+key2=".*?"|key2=".*?"\s+key1=".*?")\s+\/>$

This is why I said you should use a XML parser.

Sweeper
  • 213,210
  • 22
  • 193
  • 313
  • Is it not feasible with lookarounds in the way I was doing? Your second answer looks like what I was looking for, but if it becomes too long I will take your suggestion for the XML parser – cahenepudi Jan 19 '19 at 14:31
  • It is feasible, but it'll even longer. You'd have to lookahead for something, then match that something. This is quite redundant, because why not just straight up say "match something"? which is what my regex does. @cahenepudi – Sweeper Jan 19 '19 at 14:34
  • @cahenepudi If you think an answers your question, please consider accepting it by clicking on that checkmark! – Sweeper Jan 19 '19 at 14:35
  • Accepting after reading why not to use regexps for parsing XML, although this solution in my case would work. – cahenepudi Jan 19 '19 at 14:42