Using HTMLAgility pack to extract value from a Xpath using c# console app

Question

I have the following line of HTML code and I used google chrome for xpath.

<DIV id=TasheelPaymentCtrl1_dvPayment>
<TABLE border=1 cellSpacing=0 borderColor=black cellPadding=7 width=625 align=center>
<TBODY>
<TR>
<TD class=ReceiptHeadArbCenterHead1 width=320>المسمى </TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>دفع إلى</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>القيمة</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>الكمية</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>المجموع</TD></TR>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم وزارة العمل</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم الدرهم الإلكتروني</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم مراكز الخدمة </TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead1 colSpan=4>المجموع</TD>
<TD class=ReceiptValueArbCenter>53</TD></TR></TBODY></TABLE></DIV>

I want to extract values 3, 3, 47 and 53

I tried using this xpath

 var gf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5]");

                foreach (var node in gf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }

                var sf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5]");

                foreach (var node in sf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }
                var tf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5]");

                foreach (var node in tf)
                {


                    Console.WriteLine(node.InnerText); //output: "47"
                }
var Allf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2]");

                foreach (var node in Allf )
                {


                    Console.WriteLine(node.InnerText); //output: "53"
                }

but i am getting null object exception.. I used Google chrome developer tools to copy the xpath. I am getting null point exception . How can extract value .. My question is why I am getting null point reference exception, is there any mistake in xpath value? Please help me.

Possible duplicate of [What is a NullReferenceException and how do I fix it?](http://stackoverflow.com/questions/4660142/what-is-a-nullreferenceexception-and-how-do-i-fix-it) — Andrey Korneyev, Mar 03 '16 at 07:20
@DanielHilgarth , I am getting null point reference exception in for each loop "gf" — mbdAli, Mar 03 '16 at 07:24
So, it looks like the XPath is not working. Are you sure that the indexes are one-based? I would think that this XPath looks more correct with zero based indexes: `//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[1]/td[4]` — Daniel Hilgarth, Mar 03 '16 at 07:27
@DanielHilgarth please explain what is one-based and zero-based. i am not sure about that — mbdAli, Mar 03 '16 at 07:34
Just try the new XPath. *zero based* means that the first item in an array has the index 0. *one based* means that the first item in an array has the index 1. — Daniel Hilgarth, Mar 03 '16 at 07:35
I found the solution. is not properly closed. is there anyway solve to overcome without closing . — mbdAli, Mar 03 '16 at 07:59

score 1 · Accepted Answer · answered Mar 03 '16 at 08:59

As you have discovered, some of your XPath expressions don't work because the <tr> tags are not all closed.

Therefore, you will need to cater for this in your XPath expressions:

//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5] - no change
//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/td[5]
//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/td[5]
//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/tr/td[2]

Good, but i came up with this option. clearing html documents. HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); text = Regex.Replace(text, "]*>(?:(?!?tr>||).)*?(?=]*>||)", "$&", RegexOptions.Singleline | RegexOptions.IgnoreCase); doc.LoadHtml(text); doc.OptionAutoCloseOnEnd = true; — mbdAli, Mar 03 '16 at 11:46

Using HTMLAgility pack to extract value from a Xpath using c# console app

1 Answers1