2

I need to write an XSD schema with a restriction on a field, to ensure that the value of the field does not contain the substring FILENAME at any location.

For example, all of the following must be invalid:

FILENAME
ORIGINFILENAME
FILENAMETEST
123FILENAME456

None of these values should be valid.

In a regular expression language that supports negative lookahead, I could do this by writing /^((?!FILENAME).)*$ but the XSD pattern language does not support negative lookahead.

How can I implement an XSD pattern restriction with the same effect as /^((?!FILENAME).)*$ ?

I need to use pattern, because I don't have access to XSD 1.1 assertions, which are the other obvious possibility.

The question XSD restriction that negates a matching string covers a similar case, but in that case the forbidden string is forbidden only as a prefix, which makes checking the constraint easier. How can the solution there be extended to cover the case where we have to check all locations within the input string, and not just the beginning?

Community
  • 1
  • 1
Jpff
  • 21
  • 1
  • 4
  • 2
    Possible duplicate of [XSD restriction that negates a matching string](http://stackoverflow.com/questions/9889206/xsd-restriction-that-negates-a-matching-string) – CSchulz Jun 01 '16 at 08:34
  • Not duplicated because in my case de string FILENAME occurs in any part of string not only in start – Jpff Jun 01 '16 at 08:50
  • 1
    Yes, it's a duplicate. Every method described in the answers to that question is applicable to this question: use an assertion to assert that the value does not have FILENAME as a substring, or write a regular expression that matches strings with no F, optionally followed by strings beginning F and continuing with not-I, or beginning FI and continuing with not-L, or ... – C. M. Sperberg-McQueen Jun 01 '16 at 16:45
  • My version XSD schema doesn't support assert. In regular expression ([^F].*) |.{1}([^I].*)? |.{2}([^L].*)? |.{3}([^E].*)? the string "as" not match and the string "FILE" mach. I need the opposite – Jpff Jun 01 '16 at 21:33
  • When I use the regular expression you show in a schema, it rejects *both* "as" and "FILE". When I delete the blanks before the or-bars (whitespace is not ignored in XSD regular expressions), the schema accepts "as" and rejects "FILE". – C. M. Sperberg-McQueen Jun 04 '16 at 00:34
  • But you've persuaded me that the question isn't really a duplicate. Retracted my close vote and provided an answer. – C. M. Sperberg-McQueen Jun 04 '16 at 01:51
  • See also https://stackoverflow.com/questions/59336944/not-allowing-a-specific-string-in-an-xsd-regular-expression/59337276 –  Dec 15 '19 at 19:31

1 Answers1

3

OK, the OP has persuaded me that while the other question mentioned has an overlapping topic, the fact that the forbidden string is forbidden at all locations, not just as a prefix, complicates things enough to require a separate answer, at least for the XSD 1.0 case. (I started to add this answer as an addendum to my answer to the other question, and it grew too large.)

There are two approaches one can use here.

First, in XSD 1.1, a simple assertion of the form

not(matches($v, 'FILENAME'))

ought to do the job.

Second, if one is forced to work with an XSD 1.0 processor, one needs a pattern that will match all and only strings that don't contain the forbidden substring (here 'FILENAME').

One way to do this is to ensure that the character 'F' never occurs in the input. That's too drastic, but it does do the job: strings not containing the first character of the forbidden string do not contain the forbidden string.

But what of strings that do contain an occurrence of 'F'? They are fine, as long as no 'F' is followed by the string 'ILENAME'.

Putting that last point more abstractly, we can say that any acceptable string (any string that doesn't contain the string 'FILENAME') can be divided into two parts:

  1. a prefix which contains no occurrences of the character 'F'
  2. zero or more occurrences of 'F' followed by a string that doesn't match 'ILENAME' and doesn't contain any 'F'.

The prefix is easy to match: [^F]*.

The strings that start with F but don't match 'FILENAME' are a bit more complicated; just as we don't want to outlaw all occurrences of 'F', we also don't want to outlaw 'FI', 'FIL', etc. -- but each occurrence of such a dangerous string must be followed either by the end of the string, or by a letter that doesn't match the next letter of the forbidden string, or by another 'F' which begins another region we need to test. So for each proper prefix of the forbidden string, we create a regular expression of the form

$prefix || '([^F' || next-character-in-forbidden-string || ']' 
    || '[^F]*'

Then we join all of those regular expressions with or-bars.

The end result in this case is something like the following (I have inserted newlines here and there, to make it easier to read; before use, they will need to be taken back out):

[^F]*
((F([^FI][^F]*)?)
|(FI([^FL][^F]*)?)
|(FIL([^FE][^F]*)?)
|(FILE([^FN][^F]*)?)
|(FILEN([^FA][^F]*)?)
|(FILENA([^FM][^F]*)?)
|(FILENAM([^FE][^F]*)?))*

Two points to bear in mind:

  • XSD regular expressions are implicitly anchored; testing this with a non-anchored regular expression evaluator will not produce the correct results.
  • It may not be obvious at first why the alternatives in the choice all end with [^F]* instead of .*. Thinking about the string 'FEEFIFILENAME' may help. We have to check every occurrence of 'F' to make sure it's not followed by 'ILENAME'.
Community
  • 1
  • 1
C. M. Sperberg-McQueen
  • 24,596
  • 5
  • 38
  • 65