13

In my XML schema I have element of type string that I don't want to be empty (if it contains white-spaces etc I also consider it empty)

I applied restrinction I found at http://blogs.msdn.com/b/neerajag/archive/2005/08/12/450723.aspx

<xsd:restriction base = "xsd:string">
  <xs:minLength value="1" />
  <xs:pattern value=".*[^\s].*" />
</xsd:restriction>

What exactly does that pattern do and will do what I expect?

jlp
  • 9,800
  • 16
  • 53
  • 74

4 Answers4

8

doesn't this do exactly what you want?

 <xs:restriction base="xs:token">
  <xs:minLength value="1"/>
 </xs:restriction>

If the string contains only whitespace (line feeds, carriage returns, tabs, leading and trailing spaces), the processor will remove them so validation will fail; if there's anything else, validation will succeed. (note though: internal sequences of two or more spaces will be removed - make sure you're ok with that)

Frederik
  • 81
  • 1
  • 1
7

The pattern:

  • zero or more characters .* (. matches any character).
  • matches something not in the listed set of characters. \s is whitespace, so [^\s] is "match something that isn't a whitespace. The initial ^ in the match negates the normal match any one of these characters.
  • zero or more characters.
Richard
  • 106,783
  • 21
  • 203
  • 265
  • It rejects text with carriage returns char. Why? – jlp May 26 '11 at 10:25
  • 1
    @jlp: "`.` matches any character" is the quick version. The full version includes a note that depending on the regex engine it might not match newlines *by default*. Try replacing `.*` with `[.\s]*` as `\s` matches includes both newline and carriage return. – Richard May 27 '11 at 07:58
  • Could you please write pattern that meets requirements from question? – jlp May 27 '11 at 15:25
  • @jlp: What requirement? The question is "what does this pattern mean?". – Richard May 27 '11 at 15:29
  • string that is not empty (only whitespaces and newlines means is empty too) – jlp Oct 13 '11 at 08:51
  • @jlp According to the XSD recommendation's definition of [`\s`](http://www.w3.org/TR/2004/REC-xmlschema-2-20041028/#dt-regex) it is equivalent to `[#x20\t\n\r]` – which includes both newline and carriage return. Therefore a string containing a single newline shouldn't match (since the character class has to match something). If such simple test cases are not correct then time to submit a short test case to the XSD engine's vendor. – Richard Oct 13 '11 at 10:06
  • According to the W3C documentation at [http://www.w3.org/TR/xmlschema11-2/#dt-regex](http://www.w3.org/TR/xmlschema11-2/#dt-regex), section "G.4.2.5 Multi-character escapes" defines the `.` character sequence as the `[^\n\r]` character class. So `.` matches any character except newline and carriage return. I plan on using `[.\s]*\S[.\s]*`, now that I've read the suggestion from @Richard. (Note that the documentation says that `\S` is defined as `[^\s]`.) – ALEXintlsos Oct 17 '13 at 21:17
5

I don't know if still useful but I found a better pattern than the first posted. Here it is:

<xs:simpleType name="nonEmptyString">
    <xs:restriction base="xs:string">
        <xs:pattern value="(\s*[^\s]\s*)+"></xs:pattern>
    </xs:restriction>
</xs:simpleType>

Using Eclipse, seems to work fine.

maurizeio
  • 266
  • 2
  • 11
  • 1
    I just wrote 5 unit tests testing a) empty string b) single space c) d) multiple spaces containing a new line e) multiple spaces containing multiple new lines. It works for all those cases, so I would suggest that this is a good solution to anyone needing it! :) well done. – Tod Thomson Sep 12 '13 at 23:28
  • This is the only solution out of all the answers that works for my use case +1 – Chris L Oct 01 '14 at 14:17
  • This one works great, the difference compared to the thread posters example is that this uses \s* instead of .*, . does not match newline! – Daniel Edholm Ignat Apr 08 '15 at 13:17
1

Looking at subject of the post "pattern for not allowing empty strings" which is still unanswered. You can do that using <xsd:whiteSpace value="collapse" /> tag to disallow spaces

whiteSpace constraint set to "collapse", it will do the following

  1. removes all white space characters including line feeds, tabs, spaces, carriage returns
  2. leading and trailing spaces are removed
  3. multiple spaces are reduced to a single space

Reference: W3C whiteSpace

M. Atif Riaz
  • 492
  • 1
  • 9
  • 22
  • 1
    Your description of the 'collapse' value seems incorrect; especially number 1. The only part of number 1 that seems correct is the part that is duplicated in number 2. But considering your source (w3schools), I can see why you might be confused; W3Schools is a pretty low-quality source for many things. (see http://w3fools.com) – Andrew Barber Dec 03 '12 at 12:59
  • @Andrew Thanks for pointing. I have also referred to W3C for reference W3C whiteSpace - http://www.w3.org/TR/xmlschema-2/#rf-whiteSpace Point 2,3 seems to be perfectly correct. Although no mentioning of point 1 but it is also working Updated the Reference – M. Atif Riaz Dec 04 '12 at 07:35
  • Excellent update on the reference. I'm still a tiny bit not sure on #1 (#2, 3 are definitely correct, as you note). On #1, does it not convert those types of whitespace to spaces, rather than simply removing them? (and then, subsequently, collapses all spaces to a single space...) – Andrew Barber Dec 04 '12 at 14:33