0

I have a schema which validates my xml file but I want to make further validations such as numeric ranges, birthday structure eg. DD/MM/YY not mm/dd/yy. student name allows for special characters eg. _ in name etc. at the moment when I run my code I get the error:
[Error]: Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e)

sample of my xml:

 <?xml version="1.0" encoding="us-ascii" standalone="yes"?>
  <studentTable xmlns="namespace">

    <student>
      <ID>0</ID>
      <student_name>John</student_name>
      <birthday>25/09/1997</birthday>
    </student>

I have tried the following code but recieve an error " Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e)"

        Dim xdoc As XmlDocument
        Dim nodelist As XmlNodeList
        Dim node As XmlNode
        Dim ID, birthday, student_name As String

        xdoc = New XmlDocument

        xdoc.LoadXml("student2.xml")

        nodelist = xdoc.SelectNodes("/studentTable/student")

        For Each node In nodelist

            ID = node.ChildNodes.Item(0).Attributes.GetNamedItem("ID").Value
            birthday = node.ChildNodes.Item(1).Attributes.GetNamedItem("birthday").Value
            student_name = node.ChildNodes.Item(2).Attributes.GetNamedItem("student_name").Value

        Dim rgx As New Regex("^[0-9]*$")
            If rgx.IsMatch(ID) = False Then
                lstErrs.Add("Invalid ID number")
            End If

        Dim reg As New Regex("^(((0[1-9]|[12]\d|3[01])\/(0[13578]|1[02])\/((19|[2-9]\d)\d{2}))|((0[1-9]|[12]\d|30)\/(0[13456789]|1[012])\/((19|[2-9]\d)\d{2}))|((0[1-9]|1\d|2[0-8])\/02\/((19|[2-9]\d)\d{2}))|(29\/02\/((1[6-9]|[2-9]\d)(0[48]|[2468][048]|[13579][26])|((16|[2468][048]|[3579][26])00))))$")
            If reg.IsMatch(birthday) = False Then
                lstErrs.Add("Invalid birthday")
            End If

        Dim regx As New Regex("^[a-zA-Z]+(([',. -][a-zA-Z ])?[a-zA-Z]*)*$")
            If regx.IsMatch(student_name) = False Then
                lstErrs.Add("Invalid Name")
            End If

        Next

xml errors

        If lstErrs.Count > 0 Then

                '-- Output list of errors
                MsgBox("Complete but with errors! Check error file.") '& vbCrLf & vbCrLf & Strings.Join(lstErrs.ToArray, vbCrLf))
                fileWriter.WriteLine("Filename:   " & strFilNme)
                fileWriter.WriteLine(vbCrLf)
                fileWriter.WriteLine("Errors:")
                For i As Integer = 0 To lstErrs.Count - 1
                    fileWriter.WriteLine(lstErrs(i))
                Next
            Else
                MsgBox("Complete!")
            Exit Sub
        End If

        fileWriter.Close()


    Catch ex As XmlSchemaValidationException
            MsgBox("Complete but with errors! Check error file.")
            fileWriter.WriteLine("[Error]: XmlSchemaValidationException -error!!!!!!")
            fileWriter.WriteLine("LineNumber = {0}", ex.LineNumber)
            fileWriter.WriteLine("LinePosition = {0}", ex.LinePosition)
            fileWriter.WriteLine("Message = {0}", ex.Message)
            fileWriter.WriteLine("Source = {0}", ex.Source)

        Catch exOther As Exception
            MsgBox("Complete but with errors! Check error file.")
            fileWriter.WriteLine("[Error]: " & exOther.Message & exOther.StackTrace)

        Finally

            If Not IsNothing(reader) Then
            reader.Close()
        End If

        If Not IsNothing(fileWriter) Then
            fileWriter.Close()
        End If

    End Try

End Sub

Private Sub ValidationEventHandler(ByVal sender As Object, ByVal e As ValidationEventArgs)



'MsgBox("Display Errors")
    Select Case e.Severity
        Case XmlSeverityType.Error
            lstErrs.Add("Error: {0} " & e.Message)
        Case XmlSeverityType.Warning
            lstErrs.Add("Warning {0} " & e.Message)
        Case Else
            lstErrs.Add(e.Message)

    End Select
End Sub

I have tried changing LoadXml to just load but then my code runs without an error but my regex doesnt validate the xml values. any help would be great thanks.

  • Unfortunately I can't help you with your XML code, hopefully someone else can. Maybe you could also specify in your question which line of code the error occurs on. – Callum Watkins Feb 25 '19 at 16:19
  • I have it In my question the exact error I recieve "Data at the root level is invalid. Line 1, position 1. at System.Xml.XmlTextReaderImpl.Throw(Exception e)" – user11096438 Feb 25 '19 at 16:23
  • That sounds to me like a part of the error message, and not the line of *your* code where it is thrown. – Callum Watkins Feb 25 '19 at 17:03
  • I have added In how I coded my errors. as I still cannot figure out the cause of my error – user11096438 Feb 26 '19 at 09:04
  • I changed my xdoc.loadxml to xdoc.load and it got rif of the error but now my validations are not working and it it not identifying any errors – user11096438 Feb 26 '19 at 10:04
  • fixed the error by using GetElementsByTagName function and changing LoadXml to Load – user11096438 Feb 27 '19 at 11:12

2 Answers2

0

As you know that there is only one <ID>, <birthday>, etc. in each <student> node you can use SelectSingleNode.

Although it looks like .Value would get you what appears to be the value, it is more fiddly than that: XmlNode Value vs InnerText.

I assume you put in the xmlns="namespace" from seeing similar things in other XML. In this case, unless you are actually using it, it will only complicate matters.

To validate a date, use DateTime.TryParseExact and give it a format string - there is no need for a complicated regex which would need to be different if you changed to a better date format like yyyy-MM-dd.

You might as well declare the regexes outside the loop to keep the code inside the loop a bit tidier.

It is always frustrating to get a message that says something like "Date error" when it doesn't tell you where or what the erroneous data is.

So, with this XML file located in my "C:\Temp" directory (I don't know why you would use "us-ascii" rather than "utf-8"):

<?xml version="1.0" encoding="us-ascii" standalone="yes"?>
<studentTable>
    <student>
        <ID>0q</ID>
        <student_name>John*</student_name>
        <birthday>25/109/1997</birthday>
    </student>
</studentTable>

and this console application:

Imports System.Text.RegularExpressions
Imports System.Xml

Module Module1

    Sub Main()
        Dim lstErrs As New List(Of String)

        Dim idRegex = New Regex("^[0-9]*$")
        Dim nameRegex = New Regex("^[a-zA-Z]+(([',. -][a-zA-Z ])?[a-zA-Z]*)*$")
        Dim dateFormat = "d/M/yyyy"

        Dim xdoc As New XmlDocument()
        xdoc.Load("C:\Temp\students.xml")

        Dim nodelist = xdoc.SelectNodes("//studentTable/student")

        For Each node As XmlNode In nodelist

            Dim id = node.SelectSingleNode("//ID").InnerText
            Dim dob = node.SelectSingleNode("//birthday").InnerText
            Dim name = node.SelectSingleNode("//student_name").InnerText

            If Not idRegex.IsMatch(id) Then
                lstErrs.Add("Invalid ID number " & id)
            End If

            If Not DateTime.TryParseExact(dob, dateFormat, Nothing, Nothing, New DateTime) Then
                lstErrs.Add("Invalid birthday " & dob)
            End If

            If Not nameRegex.IsMatch(name) Then
                lstErrs.Add("Invalid Name " & name)
            End If

            Console.WriteLine($"{id} {dob} {name}") '' for checking

        Next

        Console.WriteLine(String.Join(vbCrLf, lstErrs)) '' show the errors

        Console.ReadLine()

    End Sub

End Module

I got this output:

0q 25/109/1997 John*
Invalid ID number 0q
Invalid birthday 25/109/1997
Invalid Name John*
Andrew Morton
  • 24,203
  • 9
  • 60
  • 84
-2

Do not under any circumstances try to parse XML with a regex unless you wish to invoke rite 666 Ph'nglui mglw'nafh Cthulhu R'lyeh wgah'nagl fhtagn.

Use an XML parsing library see this page for some ways to do it using C#.Net. This should be convertable into vb.Net

Edit: Further to the comments. Try using an XML schema to validate the data

JGNI
  • 3,933
  • 11
  • 21
  • 1
    I parsed the code but once I do that then how do I validate that tag ID has no characters only numbers and that student name allows for special values such as _ etc without using a regex – user11096438 Feb 25 '19 at 15:19