0

I would like to grasp some information in documents with different formats.

I have the following document:

var getSORMARC = document.evaluate("//*[@id='marcview']/tbody/tr[contains(., '245')]/following-sibling::tr[contains(.,'_c')]/td[contains(.,'_c')]/following-sibling::td[1]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
if (getSORMARC.singleNodeValue !== null) {
  var SORMARC = getSORMARC.singleNodeValue.innerText;
}
console.log(SORMARC);
<table id="marcview">
  <tbody>
    <tr>
      <td>
        <b>Title</b>
      </td>
      <td>245</td>
      <td>&nbsp;</td>
      <td>0</td>
      <td>_a</td>
      <td>Title of the document /</td>
    </tr>
    <tr>
      <td>_c</td>
      <td>Author no. 1</td>
    </tr>
  </tbody>
</table>

and this other document:

var getSORMARC = document.evaluate("//*[@id='marcview']/tbody/tr[contains(., '245')]/following-sibling::tr[contains(.,'_c')]/td[contains(.,'_c')]/following-sibling::td[1]", document, null, XPathResult.FIRST_ORDERED_NODE_TYPE, null);
if (getSORMARC.singleNodeValue !== null) {
  var SORMARC = getSORMARC.singleNodeValue.innerText;
}
console.log(SORMARC);
<table id="marcview">
  <tbody>
    <tr>
      <td>
        <b>Title</b>
      </td>
      <td>245</td>
      <td>&nbsp;</td>
      <td>0</td>
      <td>_a</td>
      <td>Title of another document/</td>
    </tr>
    <tr>
      <td>
        <b>Publication</b>
      </td>
      <td>260</td>
      <td>&nbsp;</td>
      <td>&nbsp;</td>
      <td>_c</td>
      <td>1995</td>
    </tr>
  </tbody>
</table>

As you can see, I used this XPath selector for both these documents:

//*[@id='marcview']/tbody/tr[contains(., '245')]/following-sibling::tr[contains(.,'_c')]/td[contains(.,'_c')]/following-sibling::td[1]

The problem is that if the document doesn't contain an element with text content "_c" and which is directly an ancestor (child) of a parent with text content "245", it still gives me the text of the sibling of _c of the <td> containing text "Publication" which should not be the case.

If the javascript code is ran, it will give me the following: First document: Author no. 1 Second document: (Nothing).

I actually only wanted to capture the text content if that _c has direct ancestor <td>245</td> or <td>Title ...</td>.

I am on my wits end on how to do it. I'm trying to start my xpath with _c but I'm getting some errors. Any idea on how to go about my use case?

If it can be achieved other than using document.evaluate(), I'm fine with it.

schnydszch
  • 435
  • 5
  • 19
  • Is this html provided or do you generate it ? Because it clearly lacks some attributes to qualify content (classes, ids, etc ) – Apolo Jun 18 '19 at 07:39
  • 1
    btw I don't understand what you are trying to do. Maybe you could rephrase with "Objective" / "What I tried" / "Expected result" / "Actual result" kind of question ? – Apolo Jun 18 '19 at 07:42
  • Have you tried https://stackoverflow.com/questions/3103962/converting-html-string-into-dom-elements – Ajeet Kumar Jun 18 '19 at 07:42
  • @Apolo to be honest, I re-read the question about five times and I'm not sure I got it, either. But I *think*, OP wants to find the `tr` that contains `245` then the *following* `tr` that contains `_c` and the content of `td` after it. Maybe. I'm not good with XPath, so I might be wrong. It's worth clarifying because it might be possible to answer without using (or knowing) XPath. – VLAZ Jun 18 '19 at 07:45
  • @VLAZ so you mean "Author no. 1" and "1995" in the two provided examples ? – Apolo Jun 18 '19 at 07:47
  • @Apolo if my interpretation of the XPath is correct, then yes. Of course it hinges on my interpretation there. – VLAZ Jun 18 '19 at 07:48
  • @VLAZ I think that it's what they currently have, but what they want is that the second snippet returns nothing. Tried to redesign their question a bit without changing the meaning (I hope), but I must admit I'm not 100% confident on what they are after either (my XPath skillz are not that great...). – Kaiido Jun 18 '19 at 07:51
  • @Kaiido great, so we have three people in comments with basic XPath abilities and we aren't even sure we read the question correctly... – VLAZ Jun 18 '19 at 07:53
  • @VLAZ good point, I won't try to answer unless OP makes an edit to explains better what he is looking for – Apolo Jun 18 '19 at 07:56
  • Apolo, the html is auto-generated. I just deleted some attributes in the html to lessen/simplify the post. – schnydszch Jun 18 '19 at 08:38
  • VLAZ, only Author no. 1. and Kaiido, you are correct with your edits. I added a sample of what I''m trying to achieve. Thanks! – schnydszch Jun 18 '19 at 08:43

0 Answers0