-1

I'm working on an answer site crawler, how should I get the questions text inside this td, instead of including the text in the tag

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="UTF-8" />
    <meta http-equiv="X-UA-Compatible" content="IE=edge" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
    <title>Document</title>
  </head>
  <body>
    <table
      border="0"
      width="100%"
      onclick="GiveAns(event.srcElement||event.target)"
      onmouseover="ChangeColor(event.srcElement||event.target)"
    >
      <tbody>
        <tr>
          <th class="w">Question number</th>
          <th class="w">key<br />answer</th>
          <th class="w">Choose your <br />own answer</th>
          <th class="w">Selected Topics<span id="cdes"></span></th>
          <th class="w">Error<br />Notification</th>
        </tr>
      </tbody>
      <tbody id="s1234">
        <tr id="d1">
          <th><a name="P1">1</a></th>
          <th><b>(1)</b></th>
          <th><tt> </tt></th>
          <td>
            question1
            <i>
              <a>(1)ans1</a>
            </i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>ans4</i>。<q>360 02-137</q>
          </td>
          <th class="h" onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
        <tr id="d2">
          <th><a name="P2">2</a></th>
          <th><b>(4)</b></th>
          <th><tt> </tt></th>
          <td>
            question2
            <i>(1)ans1</i>
            <i>(2)ans2</i>
            <i>(3)ans3</i>
            <i>
              <a>(4)ans4</a>
            </i>
            。
            <q>1149 </q>
          </td>
          <th class="h" onclick="E(this)"><img src="/e.gif" /></th>
        </tr>
      </tbody>
    </table>
  </body>
</html>

This is my table for site

I tried these methods

document.querySelectorAll('#s1234 tr > td:not(i)').forEach((e)=>{console.log(e)})
document.querySelectorAll('#s1234 tr > td'))

But all of these methods contain <i> and <a> tags, so how do I get just the question text?

The result I need is like this: "question1"

Relaxing
  • 33
  • 4

3 Answers3

0

It isn't super clear what you are asking, do you just need the innerText? e.g.

document.querySelectorAll('#s1234 tr > td').forEach((e) => {
  console.log(e.innerText)
})

Gives

question1 (1)ans1 (2)ans2 (3)ans3 ans4。360 02-137
question2 (1)ans1 (2)ans2 (3)ans3 (4)ans4 。 1149 

Edit:

if you just need the question part then...

document.querySelectorAll('#s1234 tr > td').forEach((e) => {
  console.log(e.firstChild.data.trim())
})

gives...

question1
question2
Fraser
  • 15,275
  • 8
  • 53
  • 104
  • These contain the answers, but I only need the text of the questions, such as "Question 1", I typed the content wrong sorry, fixed. – Relaxing Nov 05 '22 at 08:21
  • hide the tag in css, then `innerText` will only contain the "question1" text. – dandavis Nov 05 '22 at 08:24
0

You can't do it with a CSS selector (see this question).

But since you're already in JS, you can get text content in a few other ways, for which there is also a dedicated question with many options (probably this is currently the best one).

Applied to the question's code:

const extractText= (node) => {
// Assuming there's 1 text node you want.
// Change to `filter` if you want to extract all text nodes in an element.
  const text = [...node.childNodes].find(child => child.nodeType === Node.TEXT_NODE);
  return text && text.textContent.trim();
}

const allTextNodes = [...document.querySelectorAll('#s1234 tr > td')].map(extractText);
inwerpsel
  • 2,677
  • 1
  • 14
  • 21
0

I believe you only want to extract Question, your statements are little confusing

document.querySelectorAll('#s1234 tr > td').forEach((e)=>{console.log(e.firstChild.data)}) # this will give you only question
Gaurav
  • 533
  • 5
  • 20