1

I have am XML document which looks like this:

<p id="name">Rahul</P>
<p id="job">ABC Corp</P>
<p id="empid">12345</p>
<p id="age">30</p>

Now I want to extract the below tag value:

<p id="empid">12345</p>

i.e., 12345.

I tried using InStr but it takes lot of time to read a XML document and find all the values.

cxw
  • 16,685
  • 2
  • 45
  • 81
user3928562
  • 55
  • 2
  • 10
  • You could probably use something similar to what's posted here, since that looks like it could pass for HTML as well. http://stackoverflow.com/questions/25488687/parse-html-content-in-vba – brianyates Feb 02 '17 at 13:45
  • 1
    @Jinx88909 Thanks for your suggestion. I will take care – user3928562 Feb 02 '17 at 13:46
  • @BYates let me be specific, actually we receive these types of data(neither purely XML or HTML) from vendors which gets stored inside a column in our temp table, so our task is to filter out the empid value from the column using vba. And those files which dont contain the required empid tag will be inserted to rejected table. – user3928562 Feb 02 '17 at 13:51
  • @user3928562 so we'd be filtering these records in a database or other file? What is the data stored in? If it's a database, you could filter these via SQL statements or traversing a recordset. If it is Excel, cxw has a potential solution below. – Jimmy Smith Feb 02 '17 at 14:34
  • By the way, welcome to the site! Check out the [tour](https://stackoverflow.com/tour) for more about what to expect, and the [how-to-ask](https://stackoverflow.com/help/how-to-ask/) page for more about writing questions that will attract quality answers (such as that below ;) :) ). – cxw Feb 02 '17 at 14:55

2 Answers2

1

If I understand your comment correctly, you have a bunch of XML as a string. That is, something like this in one cell of a worksheet:

<p id="name">Rahul</P> <p id="job">ABC Corp</P> <p id="empid">12345</p> <p id="age">30</p>

Now, you should never use regular expressions to parse HTML or XML. That said, let's use regular expressions! Under Tools | References, add a reference to Microsoft VBScript Regular Expressions 5.5.

Option Explicit
Option Base 0

Public Sub EmpID()
    Dim re As VBScript_RegExp_55.RegExp
    Set re = New VBScript_RegExp_55.RegExp
    re.Pattern = "empid""\s*>\s*([0-9]+)"
        ' A pattern that matches the end of the empid tag
        ' and the following numerical empid value.

    Dim matches As VBScript_RegExp_55.MatchCollection
    Dim match As VBScript_RegExp_55.match
    Set matches = re.Execute(ActiveSheet.Range("A1"))
        ' Or whatever text you want ^^^^^^^^^^^^^^^^

    For Each match In matches
        Debug.Print match.SubMatches(0)    ' The number
        ' Or whatever you want to do with it
    Next match
End Sub

On my Excel 2013, this outputs 12345 to the debug console, using the input above.

Community
  • 1
  • 1
cxw
  • 16,685
  • 2
  • 45
  • 81
1

You need a reference set to Microsoft XML v6.0

Assuming your XML looks like this (I've added a root node. This is required to get the path of the node you're searching for)

<Root>
     <p id="name">Rahul</p>
     <p id="job">ABC Corp</p>
     <p id="empid">12345</p>
     <p id="age">30</p>
</Root>

The below code will print "12345" although you can change the string you're searching for.

Sub Test()
Dim xDoc As New MSXML2.DOMDocument60
Dim xNode As MSXML2.IXMLDOMElement
Dim strPath As String
Dim strSearch As String

strSearch = "empid"
strPath = "Path to XML file here"
'Try To Load The Document
With xDoc
    If Not .Load(strPath) Then
        'Not Loaded. Display Error And Exit
        MsgBox "Unable to load"
        Exit Sub
    End If
    'Get The Node Where The id Attribute = Your Search String
    Set xNode = .SelectSingleNode("//Root/p[@id='" & strSearch & "']")
    'Only Proceed If A Node Is Found
    If Not xNode Is Nothing Then
        Debug.Print xNode.Text
    End If
End With

End Sub
Jiminy Cricket
  • 1,377
  • 2
  • 15
  • 24