What is wrong with my use of XPath in C#?

Question

I'm trying to do a bit of scraping in a c# application.

I am trying to access 4 pieces of information on the following page: https://smstestbed.nist.gov/vds/current

CreationTime
Availibility
Linear X and Y coords

The following function is where I am polling a live data feed from a remote machining tool. The problem I have is that whilst I have been able to print 'CreationTime' to a terminal, my XPath use is horrifically clunky and as far as This Link seems to suggest I should be able to do what I am doing in the 2 lines after my comment

"//This should be a far better way of accessing the data but for some reason the second line fails"

Unfortunately I am getting AvailabilityNode was Null.

public static void PollNIST()
    {
        string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTM
        //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
        // Retrieve raw HTML
        var NISTTargetURL = NISTSourceURL;
        var NISTHttpClient = new HttpClient();
        var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTTargetURL);  // We now have all of the HTML / XML Data as a raw string
                                                                        //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
        XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
        CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

        var elementHeader = CurNISTXML.GetElementsByTagName("Header");
        var curNISTHeader = elementHeader.Item(0);
        var creationTime = curNISTHeader.Attributes[0];  // We actually have the creationTime            
        string CurNISTTime = creationTime.InnerText; ; //      //*[@id="mtconnect content"]/ul/li[1]

        //This should be a far better way of accessing the data but for some reason the second line fails
        XmlNode AvailabilityNode = CurNISTXML.SelectSingleNode("/table[1]/tbody/tr[1]");  //*[@id="mtconnect content"]/table[1]/tbody/tr[1]/td[7] // Xpath Availability
        var CurNISTStatus = AvailabilityNode.InnerText; //      //*[@id="mtconnect content"]/ul/li[1]


        string CurNistX = ""; //      //*[@id="mtconnect content"]/table[5]/tbody/tr/td[7]
        string CurNistY = ""; //      //*[@id="mtconnect content"]/table[6]/tbody/tr/td[7]

        Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
        Console.WriteLine("NIST Time  : " + creationTime.InnerText);
        Console.WriteLine("NIST Status: " + CurNISTStatus);    
        Console.WriteLine("NIST X Pos.: " + CurNistX);
        Console.WriteLine("NIST Y Pos.: " + CurNistY);
        Console.WriteLine("--------END NIST DATA PACKET--------");

        //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
    }

Any ideas?

YOu are trying to parse an html webpage using xml. YOu are using the wrong URL. The data is avaiable as XML but you need to use s different URL. See : https://www.nist.gov/programs-projects/materials-data-curation-system — jdweng, Nov 06 '18 at 10:51
Are you sure? If I print the XML doc to console it's all there, and creationtime works just fine. — GigaJoules, Nov 06 '18 at 10:56
This is my first time writing c# so I'm getting stuck with things that are probably quite simple — GigaJoules, Nov 06 '18 at 11:07
The timestamp is gained only using the link given in the first line of the method — GigaJoules, Nov 06 '18 at 11:27
When I view source it appears the link ending "/vds/current" is the path to the XML? — GigaJoules, Nov 06 '18 at 11:31
The smstestbed has a schema location at the top of the xml file. Get the schema from location. Then use the msdn xsd.exe tool to convert xml to classes (option /cl /l:cs). Then use xml serialization to parse data. — jdweng, Nov 06 '18 at 11:31
I think I've mixed up 'view source' and 'inspect'. When I hit view source I only see XML. — GigaJoules, Nov 13 '18 at 10:44
When I do a Console.WriteLine(CurNistXML.InnerXML); i get something that starts with — GigaJoules, Nov 13 '18 at 10:48
Does the xml contain tag "Header" [CurNISTXML.GetElementsByTagName("Header");]. I think the xml is embedded in Html. The Header tag is part of the HTML and doesn't exist in the Xml. — jdweng, Nov 13 '18 at 11:53

score 1 · Answer 1 · answered Nov 06 '18 at 12:25

1

The XPath expression

/table[1]/tbody/tr[1]

will succeed only if the outermost element of the document is a table element, which seems unlikely. I haven't tried to understand the logic of the page or of your code, but this definitely looks wrong. "/" at the start of a path expression selects from the root of the tree.

answered Nov 06 '18 at 12:25

Michael Kay

156,231
11
92
164

Yeah I though that, I've tried several different things there which is why I think that single slash is there – GigaJoules Nov 06 '18 at 12:39
@GigaJoules Does '//table[1]/tbody/tr[1]' select what you wanted? It is unclear to me which element you are trying to select. – Mate Mrše Nov 06 '18 at 14:11
@GigaJoules We see a lot of questions where people have scattered random punctuation around their XPath expressions in the hope that it will act as magic fairy dust. It's rarely an effective strategy. Save yourself time, read the manual. – Michael Kay Nov 06 '18 at 15:57
I'm looking to pull the word 'available' from the top right cell of the first table, and the 'value' number of tables 'linear x' and 'linear y' – GigaJoules Nov 07 '18 at 11:33
Going for the attribute ID was a far better option in the end, as each element has a unique identifier and only occurs once. – GigaJoules Feb 14 '19 at 10:20

score 0 · Accepted Answer · answered Nov 13 '18 at 12:42

So it turns out there was nothing wrong with how I was extracting the XML, only with my Paths.

public static void PollNIST()
        {
            string NISTSourceURL = "https://smstestbed.nist.gov/vds/current";  // Gives us a human friendly reference to the HTMl
            // string NistXmlUrl = // Someone on stackexchange is claiming that there is another url for the XML but viewsource says otherwise 
            //-------------------------------- Current (mostly) Working Version---------------------------------------------------------------------------------
            var NISTHttpClient = new HttpClient();
            var NISTXMLRaw = NISTHttpClient.GetStringAsync(NISTSourceURL);  // We now have all of the HTML / XML Data as a raw string
                                                                            //Console.WriteLine(MazXMLRaw.Result);                   // Prints the resulting HTML to a terminal as a debug tool    (Works)   
            XmlDocument CurNISTXML = new XmlDocument();               // Generate Blank XML Doc
            CurNISTXML.LoadXml(NISTXMLRaw.Result);                     // This (".result") passes the actual string?, should then be loaded into new XML file

            // Get CreationTime (WORKING!)
            XmlNodeList elementHeader = CurNISTXML.GetElementsByTagName("Header");
            XmlNode curNISTHeader = elementHeader.Item(0);
            XmlAttribute creationTime = curNISTHeader.Attributes[0];  // We now have the creationTime element          
            string CurNISTTime = creationTime.InnerText;  //      //*[@id="mtconnect content"]/ul/li[1]

            // Get availability (WORKING!)
            XmlNodeList nodeAvailability = CurNISTXML.GetElementsByTagName("Availability");
            XmlNode availability = nodeAvailability.Item(0); // I think this is maybe a bit of a hackish / improper way to do this?
            string curNISTStatus = availability.InnerText;

            //Get linear tool X Coord.
            XmlNodeList deviceStream = CurNISTXML.GetElementsByTagName("ComponentStream");
            XmlNode linearCompXStream = deviceStream.Item(4);
            string curNISTX = linearCompXStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within

            //Get Linear tool y Coord.            
            XmlNode linearCompYStream = deviceStream.Item(5);
            string curNISTY = linearCompYStream.InnerText; //  We do not need to break down the nodes any further as the value is the only text within


            Console.WriteLine("-------BEGIN NIST DATA PACKET-------");
            Console.WriteLine("NIST Time  : " + creationTime.InnerText);
            Console.WriteLine("NIST Status: " + curNISTStatus);    
            Console.WriteLine("NIST X Pos.: " + curNISTX);
            Console.WriteLine("NIST Y Pos.: " + curNISTY);
            Console.WriteLine("--------END NIST DATA PACKET--------");

            //var currentNIST = new NISTDataSet()// Create new instance ofNISTdata object
        }

works nicely.

What is wrong with my use of XPath in C#?

2 Answers2

Linked