0

This question follows from .net regex - strings that don't contain full stop on last list item

Problem is now the below. Note that examples have been amended and more added - all need to be satisfied. Good examples should return no matches, and bad examples should return matches.

I'm trying to use .net regex for identifying strings in XML data that don't contain a full stop before the last tag. I have not much experience with regex. I'm not sure what I need to change & why to get the result I'm looking for.

There are line breaks and carriage returns at end of each line in the data.

A schema is used for the XML. We have no access to .Net code - just users using a custom built application.

Example 1 of bad XML Data - should give 1 match:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc</item>
</randlist>

Example 2 of bad XML Data - should give 1 match:

<randlist prefix="unorder">
    <item>abc. abc</item>
    <item>abc. abc</item>
    <item>abc. abc</item>
</randlist>

Example 1 of good XML Data - regexp should give no matches - full stop preceding last </item>:

<randlist prefix="unorder">
    <item>abc</item>
    <item>abc</item>
    <item>abc.</item>
</randlist>

Example 2 of good XML Data - regexp should give no matches - full stop preceding last </item>:

<randlist prefix="unorder">
    <item>abc. abc</item>
    <item>abc. abc</item>
    <item>abc. abc.</item>
</randlist>

Reg exp patterns I tried that didn't work (either false positives or no matches using https://regex101.com/) for criteria above in the bad XML data (not tested on good XML data):

^<randlist \w*=[\S\s]*\.*[^.]*<\/item>[\n]*<\/randlist>$
^\s+<item>[^<]*?(?<=\.)<\/item>$
unseen_rider
  • 324
  • 5
  • 23
  • So the dot specifically needs to be right before the last `` in ``? In other words, ` abc. abc abc. abc. abc. abc ` would be considered a failure because it only has the dot at the end of the second item? What about ` abc. abc abc. abc. abc. abc. `, which has a dot at the end of the second and last items? – Zaelin Goodman Jan 22 '20 at 18:16

2 Answers2

0

Seeing how you are using .NET, you could:

  1. Load the XML file in an XML Document.
  2. Use the GetElementsByTagName method to get all your item tags within the randlist element.
  3. Get the last element returned by [2].
  4. Check if it contains the period character.

The above should be more readable, and if the structure of the XML changes, you won't have to rewrite half your script.

npinti
  • 51,780
  • 5
  • 72
  • 96
  • 1
    Hi we are using .net, but no access to coding the application, which is custom made - we are just users, hence needing regexp as a solution. Your solution may work, but we can't try it in our environment. – unseen_rider Jan 22 '20 at 12:01
0

The regexp pattern below works for us - tested in Notepad++

[^.]<\/item>\s{1,2}<\/randlist>
unseen_rider
  • 324
  • 5
  • 23