<div id="div1">
<span>Span text 1</span>
<span>Span text 2</span>
Div Inner Text
</div>
How to extract only the div1 text (Div Inner Text)?
div1.innerText returns and the span's text.
<div id="div1">
<span>Span text 1</span>
<span>Span text 2</span>
Div Inner Text
</div>
How to extract only the div1 text (Div Inner Text)?
div1.innerText returns and the span's text.
There are similar questions regarding fetching an element's inner text.
Solution 1 : see this question
HtmlElement e1 = webBrowser1.Document.GetElementById("elementId");
string content = e1.InnerText
MessageBox.Show(content);
Solution 2 : use Javascript using HtmlDocument.InvokeScript method
In your HTML :
<script type="text/javascript">
function getInnerText(id) {
return document.getElementById(id)..innerText;
}
</script>
C#
Object[] objArray = new Object[1];
objArray[0] = (Object)"elementId";
string content = webBrowser1.Document.InvokeScript("getInnerText", objArray);
MessageBox.Show(content);
The approach I would take it to iterate over child nodes, test if each is a textnode and if it is store it in an array and then return the elements of the array concatenated.
function innerText(element){
var i, text = [], child = null;
for(i = 0; i < element.childNodes.length; i++){
child = element.childNodes[i]
if (child.nodeType === 3 &&
child.nodeValue.match(/[^\n\s\t\r]/)){
text.push(child.nodeValue);
}
}
return text.join("");
}
// Example call
alert(innerText(document.getElementById("div1")));
The code above uses the nodeValue property of DOMElements to check whether a node is a text node (nodeValue === 3) and that the element contains more than whitespace. The result could be tidied by trimming leading and trailing whitespace.
Using Yanick's code as a template as it seems straight forward. Update the JavaScript function to;
function innerText(id){
var i, text = [], child = null, element = document.getElementById(id);
for(i = 0; i < element.childNodes.length; i++){
child = element.childNodes[i]
if (child.nodeType === 3 &&
child.nodeValue.match(/[^\n\s\t\r]/)){
text.push(child.nodeValue);
}
}
return text.join("");
}
Then it can be called using:
string content =
(string)webBrowser1.Document.InvokeScript("innerText",
new string[] { "div1" });
The variable content
will contain the inner text value.
This doesn't check that the id passed to the function exists so additional checks would be required for a real world application.