1

I am trying to understand the the structure of the following XML so that I can parse it in to Excel columns.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE raml SYSTEM 'raml20.dtd'>
<raml version="2.0" xmlns="raml20.xsd">
  <cmData type="actual">
    <header>
      <log dateTime="2019-02-08T15:05:41.000Z" action="created" appInfo="ActualExporter">InternalValues are used</log>
    </header>
    <managedObject class="AMLEPR" version="FL18_1711_06_1711_05" distName="PLMN-PLMN/MRBTS-502393/LNBTS-502393/LNCEL-10/AMLEPR-0" id="9075019">
      <defaults name="BLANK"/>
      <p name="cacHeadroom">0</p>
      <p name="deltaCac">2</p>
      <p name="maxCacThreshold">100</p>
      <p name="targetCarrierFreq">1400</p>
    </managedObject>
    <managedObject class="AMLEPR" version="FL18_1711_06_1711_05" distName="PLMN-PLMN/MRBTS-502393/LNBTS-502393/LNCEL-110/AMLEPR-0" id="9075022">
      <defaults name="BLANK"/>
      <p name="cacHeadroom">0</p>
      <p name="deltaCac">2</p>
      <p name="maxCacThreshold">100</p>
      <p name="targetCarrierFreq">2100</p>
    </managedObject>
    <managedObject class="AMLEPR" version="FL18_1711_06_1711_05" distName="PLMN-PLMN/MRBTS-502393/LNBTS-502393/LNCEL-114/AMLEPR-0" id="10755757">
      <defaults name="BLANK"/>
      <p name="cacHeadroom">0</p>
      <p name="deltaCac">2</p>
      <p name="maxCacThreshold">100</p>
      <p name="targetCarrierFreq">2300</p>
    </managedObject>
    <managedObject class="AMLEPR" version="FL18_1711_06_1711_05" distName="PLMN-PLMN/MRBTS-502393/LNBTS-502393/LNCEL-120/AMLEPR-0" id="9075025">
      <defaults name="BLANK"/>
      <p name="cacHeadroom">0</p>
      <p name="deltaCac">2</p>
      <p name="maxCacThreshold">100</p>
      <p name="targetCarrierFreq">2500</p>
    </managedObject>
      </cmData>
</raml>

I want to parse it in such a way that it looks like this

enter image description here

I want to understand how the child elements are defined in this XML. I see there are multiple elements with the same name "p" so whenever I try to import it in MS Access or even in VB.net Dataset then I get a column name with "p".

The code I am using to update Access is below:

Dim cs As String
        cs = "Provider=Microsoft.ACE.OLEDB.12.0; Data Source=" & Form1.db_txtbox.Text & "; Persist Security Info=False;"
        Dim con As New OleDb.OleDbConnection(cs)
        Using con
            Dim DelQuery As String
            Dim SqlQuery As String
            con.Open()
            DelQuery = "Delete * from LTE_AMLEPR"
            SqlQuery = "INSERT INTO LTE_AMLEPR (distName, MRBTS, LNBTS, LNCEL, AMLEPR, version, ID, cacHeadroom, deltaCac, maxCacThreshold, targetCarrierFreq) VALUES (@distName , @MRBTS, @LNBTS, @LNCEL, @AMLEPR, @version, @id, @cacHeadroom, @deltaCac, @maxCacThreshold, @targetCarrierFreq)"
            Dim da As New OleDbDataAdapter()
            Dim olecmd As OleDbCommand = New OleDbCommand(DelQuery, con)
            olecmd.ExecuteNonQuery()
            olecmd.Dispose()
            da.InsertCommand = New OleDbCommand(SqlQuery, con)
            da.InsertCommand.Parameters.Add("distName", OleDbType.VarChar, 100, "distName")
            da.InsertCommand.Parameters.Add("MRBTS", OleDbType.VarChar, 100, "MRBTS")
            da.InsertCommand.Parameters.Add("LNBTS", OleDbType.VarChar, 100, "LNBTS")
            da.InsertCommand.Parameters.Add("LNCEL", OleDbType.VarChar, 100, "LNCEL")
            da.InsertCommand.Parameters.Add("AMLEPR", OleDbType.VarChar, 100, "AMLEPR")
            da.InsertCommand.Parameters.Add("version", OleDbType.VarChar, 100, "version")
            da.InsertCommand.Parameters.Add("ID", OleDbType.VarChar, 100, "id")
            da.InsertCommand.Parameters.Add("cacHeadroom", OleDbType.VarChar, 20, "cacHeadroom")
            da.InsertCommand.Parameters.Add("deltaCac", OleDbType.VarChar, 20, "deltaCac")
            da.InsertCommand.Parameters.Add("maxCacThreshold", OleDbType.VarChar, 20, "maxCacThreshold")
            da.InsertCommand.Parameters.Add("maxCacThreshold", OleDbType.VarChar, 20, "targetCarrierFreq")
            da.Update(Dt)
        End Using

Can someone please advise and explain this structure of XML and how I can create the XSD for this type of XML.

Thanks

2 Answers2

0
Sub parseXML()

    Dim strPath As String
    Dim strRow As String
    strPath = Application.GetOpenFilename

    Dim XDoc As Object
    Set XDoc = CreateObject("MSXML2.DOMDocument")
    XDoc.async = False
    XDoc.validateOnParse = False
    XDoc.Load (strPath)
    Set xObjDetails = XDoc.ChildNodes(0)
    Set xObject = xObjDetails.FirstChild
    For Each xObject In xObjDetails.ChildNodes
       strRow = xObject.Attributes(2).NodeValue & " | " & xObject.Attributes(3).NodeValue
        For Each xChild In xObject.ChildNodes
            strRow = IIf(xChild.Text <> "", strRow & " | " & xChild.Text, strRow)
        Next xChild
        Debug.Print strRow
    Next xObject

End Sub

enter image description here

Pᴇʜ
  • 56,719
  • 10
  • 49
  • 73
Santosh
  • 12,175
  • 4
  • 41
  • 72
  • Thanks, I reckon this is done in Excel. I really appreciate for your kind help. I would like to understand some bits in the XML file if you can please explain. Is this any different type of XML? Is it possible to create XSD for this type of xml. If so then how it can be done. Any idea of how it can be done in VB.net ? Can you please also explain the code too for my understanding. Again many thanks for your help – Muhammad Adeel Ahmed Feb 11 '19 at 00:20
0

try xml linq :

Imports System.Xml
Imports System.Xml.Linq
Imports System.Data
Module Module1

    Const FILENAME As String = "c:\temp\test.xml"
    Sub Main()
        Dim dt As New DataTable
        dt.Columns.Add("class", GetType(String))
        dt.Columns.Add("distName", GetType(String))
        dt.Columns.Add("id", GetType(String))
        dt.Columns.Add("cacHeadroom", GetType(String))
        dt.Columns.Add("deltaCac", GetType(String))
        dt.Columns.Add("maxCacThreshold", GetType(String))
        dt.Columns.Add("targetCarrierFreq", GetType(String))

        Dim settings As New XmlReaderSettings
        settings.DtdProcessing = DtdProcessing.Ignore
        Dim reader As XmlReader = XmlReader.Create(FILENAME, settings)
        Dim ns As XNamespace = XNamespace.Get("raml20.xsd")

        While (Not reader.EOF)
            If reader.Name <> "managedObject" Then
                reader.ReadToFollowing("managedObject")
            End If
            If Not reader.EOF Then
                Dim newRow As DataRow = dt.Rows.Add()
                Dim managedObject As XElement = XElement.ReadFrom(reader)

                newRow("class") = CType(managedObject.Attribute("class"), String)
                newRow("distName") = CType(managedObject.Attribute("distName"), String)
                newRow("id") = CType(managedObject.Attribute("id"), String)
                For Each p In managedObject.Elements(ns + "p")
                    Dim attrib As String = CType(p.Attribute("name"), String)
                    Dim value As String = CType(p, String)
                    newRow(attrib) = value

                Next p

            End If
        End While
    End Sub

End Module
jdweng
  • 33,250
  • 2
  • 15
  • 20
  • Thanks jdewng. Is linq better option than just the normal loop or if i use Dataset inside .net. I tried to use ReadXML Function inside .net which brings the data in to Dataset but the datatables are created differently which is why i wanted to understand the XML structure. Can you please provide some bit of explanation to your code too. Thanks for your help – Muhammad Adeel Ahmed Feb 11 '19 at 00:25
  • The q in Linq stands for QUERY. So linq is really meant for queries. It is not designed to write to a new row in a datatable. – jdweng Feb 11 '19 at 00:37
  • Descendants gets an array of xml elements like "managedObject" . the I use Attribute to get the three attibutes : class, distName, and id. Next I get all the 'p' elements in "managedObject". Again each p item has an attibute and a value. I use CType to convert the XElement to a string. – jdweng Feb 11 '19 at 00:41
  • Thanks jdweng. I tried to follow your instructions but for loops doesn't trigger. There is no error also when i try to debug. managedObject element remains blank. I will be populating the data in to access table therefore using the Oledb Connection – Muhammad Adeel Ahmed Feb 11 '19 at 01:09
  • The XML also has header – Muhammad Adeel Ahmed Feb 11 '19 at 01:21
  • Just figured it out that my xml file has namespace (raml20.xsd). I defined Dim ns as XNamespace = "raml20.xsd" and then added ns variable to the Descendants of managedObject & p. It populates the datatable perfectly. Now my next target is if ManagedObject gets changed then how to populate it in multiple datatables. I will try it myself and if there is any question then i will let you know. Thanks – Muhammad Adeel Ahmed Feb 11 '19 at 03:12
  • The xml is used did not have the namespace. The posting was updated to include the namespace after I posted my answer. – jdweng Feb 12 '19 at 01:44
  • Thats right jdweng, i apologise, i figured it myself that namespace was causing the issue so i have updated the post. I just worked out your solution of XML to LINQ but i was wondering if it can parse large XML file. The data which i have given as an example is very small. The files usually get more than 2Gb. I don't know if loading 2GB file in memory is a good option. I read online about XMLreader option instead of LINQ (or possibly hybrid solution) but i am still struggling with the namespace. For me speed is very important – Muhammad Adeel Ahmed Feb 12 '19 at 02:33
  • Many thanks jdweng, very self explanatory code. Works brilliantly :) cheers – Muhammad Adeel Ahmed Feb 13 '19 at 01:05
  • Code is self explanatory because of the variable names and the fact I avoid using "VAR" – jdweng Feb 13 '19 at 01:56
  • it works great to be honest. I will have more names of classes, so i have added the if else statement to select the particular class (if newRow("class")="AMLEPR"). I will be now importing 2 or 3 classes XML (all in one) to see if it works in same fashion. Thanks for help jdweng – Muhammad Adeel Ahmed Feb 13 '19 at 10:26
  • Use a switch instead of "IF ELSE", much cleaner. – jdweng Feb 13 '19 at 11:44
  • Thanks jdweng, i have managed to do multiple managedObject types. Now i have multiple datatables. I tried to load them in to Access database using Insert SQLQuery but it takes lot of time to load the data in access. Is there any faster way to do it. I have added the code to update the access. Any better suggestions? – Muhammad Adeel Ahmed Feb 16 '19 at 04:28
  • You can use dbContext which is faster. It will generate class in c# and map between the classes and the Access database. See : https://stackoverflow.com/questions/11432488/how-to-use-entity-framework-for-ms-access-database – jdweng Feb 16 '19 at 08:57
  • Thanks jdweng, In the link you have forwarded, people have suggested that EF is not recommended for access. Unfortunately this is what I have to use. Any other better suggestion for faster updating Access? – Muhammad Adeel Ahmed Feb 20 '19 at 00:30
  • Don't believe everything you read. Access is not intended to be used as a shared database just like excel. For single user storage, Access is not the best but will work. EF will probably be the fastest way of storing. – jdweng Feb 20 '19 at 02:31
  • I am now using DAO and it is much faster, takes less than 2 seconds (to update approx 30000 rows, compared to 11s for ADO. – Muhammad Adeel Ahmed Feb 20 '19 at 02:49