Get HttpElement text

Question

<div id="div1">
    <span>Span text 1</span>
    <span>Span text 2</span>
    Div Inner Text
</div>

How to extract only the div1 text (Div Inner Text)?

div1.innerText returns and the span's text.

C# and webbrowser? don't you mean Javascript? Also, what have you tried? It may help understand the how and what you're trying to do. — Yanick Rochon, Feb 07 '11 at 15:59
.NET WebBrowser component. Tried HttpElement.InnerText it returns inner tag's text. Tried to remove all child element, component has no such property or method. — Stanislav Stoyanov, Feb 07 '11 at 16:03

score 0 · Answer 1 · edited May 23 '17 at 12:33

0

There are similar questions regarding fetching an element's inner text.

Solution 1 : see this question

HtmlElement e1 = webBrowser1.Document.GetElementById("elementId");
string content = e1.InnerText
MessageBox.Show(content);

Solution 2 : use Javascript using HtmlDocument.InvokeScript method

In your HTML :

<script type="text/javascript">
    function getInnerText(id) {
       return document.getElementById(id)..innerText;
    }
</script>

C#

Object[] objArray = new Object[1];
objArray[0] = (Object)"elementId";
string content = webBrowser1.Document.InvokeScript("getInnerText", objArray);
MessageBox.Show(content);

edited May 23 '17 at 12:33

Community

1
1

answered Feb 07 '11 at 16:25

Yanick Rochon

51,409
25
133
214

Solution 1: This will strip any html tags and will return "Span text1 Span tex2 Div Inner Text". I want only the "Div Inner Text". Solution 2 is fine but I can not alter the source html. – Stanislav Stoyanov Feb 07 '11 at 17:11

score 0 · Answer 2 · edited Jun 20 '20 at 09:12

The approach I would take it to iterate over child nodes, test if each is a textnode and if it is store it in an array and then return the elements of the array concatenated.

  function innerText(element){
    var i, text = [], child = null;
    for(i = 0; i < element.childNodes.length; i++){
      child = element.childNodes[i]

      if (child.nodeType === 3 &&
        child.nodeValue.match(/[^\n\s\t\r]/)){
        text.push(child.nodeValue);
      }
    }
    return text.join("");
  }
  // Example call
  alert(innerText(document.getElementById("div1")));

The code above uses the nodeValue property of DOMElements to check whether a node is a text node (nodeValue === 3) and that the element contains more than whitespace. The result could be tidied by trimming leading and trailing whitespace.

Edit: C# use

Using Yanick's code as a template as it seems straight forward. Update the JavaScript function to;

  function innerText(id){

    var i, text = [], child = null, element = document.getElementById(id);
    for(i = 0; i < element.childNodes.length; i++){
      child = element.childNodes[i]

      if (child.nodeType === 3 &&
        child.nodeValue.match(/[^\n\s\t\r]/)){
        text.push(child.nodeValue);
      }
    }
    return text.join("");
  }

Then it can be called using:

string content = 
  (string)webBrowser1.Document.InvokeScript("innerText", 
                                            new string[] { "div1" });

The variable content will contain the inner text value. This doesn't check that the id passed to the function exists so additional checks would be required for a real world application.

This is fine for javascript, but how to access the DOM from .NET — Stanislav Stoyanov, Feb 07 '11 at 17:10

Get HttpElement text

2 Answers2

Edit: C# use