2

I'm trying to extract text from html using the innerText attribute like such: console.log(document.getElementById('row').innerText)

However, the output is not in the same fashion as I see it on the browser.

The reason for the difference is that the table element in the first situation contains style of inline-block (see below).

How can I solve it so I get the text in the same format as it appears in browser?

Situation # 1: Input:

<html>
   <body id='test'>
      <table style="display: inline-block">
         <tr>
            <td>1</td>
         </tr>
         <tr>
            <td>2</td>
         </tr>
      </table>
      <table style="display: inline-block">
         <tr>
            <td>3</td>
         </tr>
         <tr>
            <td>4</td>
         </tr>
      </table>
   </body>
</html>

Expected Output:

1 3
2 4

Actual Output

1
2
3
4

Situation # 2: Input:

<html>
   <body id='test'>
      <table>
         <tr>
            <td>1</td>
         </tr>
         <tr>
            <td>2</td>
         </tr>
      </table>
      <table>
         <tr>
            <td>3</td>
         </tr>
         <tr>
            <td>4</td>
         </tr>
      </table>
   </body>
</html>

Expected Output:

1 
2 
3
4

Actual Output

1
2
3
4
  • You do not have any `#row` element, so your `console.log(document.getElementById('row').innerText)` throws an error. – CertainPerformance Feb 15 '19 at 23:02
  • 1
    Can you share your Javascript code so we can understand better what you have tried so far.? – James Garcia Feb 15 '19 at 23:07
  • @JamesGarcia He did, second line of the question (though the `#row` looks like it's probably just a typo) – CertainPerformance Feb 15 '19 at 23:13
  • Yes, row is a typo, it should be “test”. I’m basically trying to get the text of the entire html – AlwaysLearning Feb 16 '19 at 00:13
  • You are getting all the text that is inside the `id=test` element, as it appears in document order (1, 2, 3, 4) as that's what innerText does -- You can't get it as-rendered in the page unless _you_ also render it, which would require you to fully parse the HTML and styles, or at least walk the DOM tree and apply the styles. – Stephen P Feb 16 '19 at 00:39
  • @StephenP I have started down that route but there a lot of styles to keep track off and it seems like a tough problem at least for me since I’m not very familiar with html and styles. – AlwaysLearning Feb 16 '19 at 01:02

1 Answers1

0

While it seems like there should be a simpler way, the DOM doesn't understand the visible order, so you probably have to transpose the values manually, like:

    // Populates domOrder from DOM (Note: These example selectors are fragile)
    const domOrder = [], visibleOrder = [];
    // Uses spread operator to get an array of tables
    const inlineTables = [...document.querySelectorAll("table")]
      .filter(table => table.style.display == "inline-block")
        .forEach(table => {
          // Gets rows
          [...table.children]
            // I'm not certain whether splitting on newlines is always reliable
            .forEach(tr => domOrder.push(tr.innerText.split(/\n/g)));
        });
    // Populates visibleOrder by transposing values from domOrder
    const rowCount = domOrder.length;
    const colCount = domOrder[0].length;
    domOrder[0].forEach( (col, colNum) => { 
      // Adds a row to visibleOrder
      visibleOrder[colNum] = []; 
      // Transposes the values 
      domOrder.forEach( (row, rowNum) => {
        visibleOrder[colNum][rowNum] = domOrder[rowNum][colNum];
      });
    });
    console.log(visibleOrder);
    <table style="display: inline-block">
       <tr><td>1</td></tr>
       <tr><td>2</td></tr>
    </table>
    <table style="display: inline-block">
       <tr><td>3</td></tr>
       <tr><td>4</td></tr>
    </table>
    <table style="display: inline-block">
       <tr><td>5</td></tr>
       <tr><td>6</td></tr>
    </table>

And here's a more robust example of matrix transposition.

Cat
  • 4,141
  • 2
  • 10
  • 18
  • The problem is that I'm trying to find a robust solution that would take into account all the different styling. Your solution works fine if the table has a styling of inline-block but it breaks in other situations. For example if the tables doesn't have "inline-block" styling but instead it has "align=left" for one table and "align=right" for anther table, the solution doesn't work. Thanks for your help. I just wanted to make sure that there isn't an easy solution before I started putting in all the work to code it myself. – AlwaysLearning Feb 17 '19 at 06:06