0

I have the following line of HTML code and I used google chrome for xpath.

<DIV id=TasheelPaymentCtrl1_dvPayment>
<TABLE border=1 cellSpacing=0 borderColor=black cellPadding=7 width=625 align=center>
<TBODY>
<TR>
<TD class=ReceiptHeadArbCenterHead1 width=320>المسمى </TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>دفع إلى</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>القيمة</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>الكمية</TD>
<TD class=ReceiptHeadArbCenterHead1 width=75>المجموع</TD></TR>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم وزارة العمل</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم الدرهم الإلكتروني</TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>3</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead>رسوم مراكز الخدمة </TD>
<TD class=ReceiptValueArbCenter>MOFI</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TD class=ReceiptValueArbCenter>1</TD>
<TD class=ReceiptValueArbCenter>47</TD>
<TR>
<TD class=ReceiptHeadArbCenterHead1 colSpan=4>المجموع</TD>
<TD class=ReceiptValueArbCenter>53</TD></TR></TBODY></TABLE></DIV>

I want to extract values 3, 3, 47 and 53

I tried using this xpath

 var gf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5]");

                foreach (var node in gf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }

                var sf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5]");

                foreach (var node in sf)
                {


                    Console.WriteLine(node.InnerText); //output: "3"
                }
                var tf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5]");

                foreach (var node in tf)
                {


                    Console.WriteLine(node.InnerText); //output: "47"
                }
var Allf = doc.DocumentNode.SelectNodes("//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2]");

                foreach (var node in Allf )
                {


                    Console.WriteLine(node.InnerText); //output: "53"
                }

but i am getting null object exception.. I used Google chrome developer tools to copy the xpath. I am getting null point exception . How can extract value .. My question is why I am getting null point reference exception, is there any mistake in xpath value? Please help me.

mbdAli
  • 31
  • 9
  • *Where* are you getting the NullReferenceException? – Daniel Hilgarth Mar 03 '16 at 07:20
  • Possible duplicate of [What is a NullReferenceException and how do I fix it?](http://stackoverflow.com/questions/4660142/what-is-a-nullreferenceexception-and-how-do-i-fix-it) – Andrey Korneyev Mar 03 '16 at 07:20
  • @DanielHilgarth , I am getting null point reference exception in for each loop "gf" – mbdAli Mar 03 '16 at 07:24
  • So, it looks like the XPath is not working. Are you sure that the indexes are one-based? I would think that this XPath looks more correct with zero based indexes: `//div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[1]/td[4]` – Daniel Hilgarth Mar 03 '16 at 07:27
  • @DanielHilgarth please explain what is one-based and zero-based. i am not sure about that – mbdAli Mar 03 '16 at 07:34
  • Just try the new XPath. *zero based* means that the first item in an array has the index 0. *one based* means that the first item in an array has the index 1. – Daniel Hilgarth Mar 03 '16 at 07:35
  • I found the solution. is not properly closed. is there anyway solve to overcome without closing . – mbdAli Mar 03 '16 at 07:59

1 Answers1

1

As you have discovered, some of your XPath expressions don't work because the <tr> tags are not all closed.

Therefore, you will need to cater for this in your XPath expressions:

  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/td[5] - no change
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[3]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/td[5]
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[4]/td[5] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/td[5]
  • //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[5]/td[2] - should be //div[@id='TasheelPaymentCtrl1_dvPayment']/table/tbody/tr[2]/tr/tr/tr/td[2]
Keith Hall
  • 15,362
  • 3
  • 53
  • 71
  • Good, but i came up with this option. clearing html documents. HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument(); text = Regex.Replace(text, "]*>(?:(?!?tr>||).)*?(?=]*>||)", "$&", RegexOptions.Singleline | RegexOptions.IgnoreCase); doc.LoadHtml(text); doc.OptionAutoCloseOnEnd = true; – mbdAli Mar 03 '16 at 11:46