I'm trying to use .net regex for identifying strings in XML data that don't contain a full stop before the last tag. I have not much experience with regex. I'm not sure what I need to change & why to get the result I'm looking for.
There are line breaks and carriage returns at end of each line in the data.
A schema is used for the XML.
Example of good XML Data:
<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc.</item>
</randlist>
Example of bad XML Data - regexp should give matches - no full stop preceding last </item>
:
<randlist prefix="unorder">
<item>abc</item>
<item>abc</item>
<item>abc</item>
</randlist>
Reg exp pattern I tried that didn't work in the bad XML data (not tested on good XML data):
^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$
Results using http://regexstorm.net/tester:
0 matches
Results using https://regex101.com/:
0 matches
This question is different to the following imo, due to full stop and start of string criteria:
Regex for string not ending with given suffix
Explanation from 3:
/
^<randlist \w*=[\S\s]*\.*[^.]<\/item>[\n]*<\/randlist>$
/
gm
^ asserts position at start of a line
<randlist matches the characters <randlist literally (case sensitive)
\w* matches any word character (equal to [a-zA-Z0-9_])
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
= matches the character = literally (case sensitive)
Match a single character present in the list below [\S\s]*
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\S matches any non-whitespace character (equal to [^\r\n\t\f\v ])
\s matches any whitespace character (equal to [\r\n\t\f\v ])
\.* matches the character . literally (case sensitive)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)
Match a single character not present in the list below [^.]
. matches the character . literally (case sensitive)
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
item> matches the characters item> literally (case sensitive)
Match a single character present in the list below [\n]*
< matches the character < literally (case sensitive)
\/ matches the character / literally (case sensitive)
randlist> matches the characters randlist> literally (case sensitive)
$ asserts position at the end of a line
Global pattern flags
g modifier: global. All matches (don't return after first match)
m modifier: multi line. Causes ^ and $ to match the begin/end of each line (not only begin/end of string)