Copy all text from webbrowser control

Question

Is it possible to scrape all the text from a site that was navigated to by WebBrowser control without looking at the source?

score 7 · Answer 1 · answered Aug 30 '13 at 05:03

David Walker's method is great when one don't need any info from the header nor non main part of the webpage. if one need something outside inner text, there is only two options, one is to parse with "getElement". the other one is issue commands (Document.ExecCommand) to webbrowser to select all and copy to clipboard:

wb.Document.ExecCommand("SelectAll", false, null);
wb.Document.ExecCommand("Copy", false, null);

then finally string content=clipboard.getText();

Please note the spelling and syntax may not be correct, I'm recalling from my memory

score 5 · Answer 2 · answered Mar 09 '11 at 17:51

5

string browserContents = webBrowser.Document.Body.InnerText;

answered Mar 09 '11 at 17:51

David Walker

1,496
2
16
19

1

Thanks for putting me onto the scent David. If you want to preserve the formatting like I do, use webBrowser.Document.Body.InnerHtml; – Skyfish Aug 10 '20 at 15:52

score 4 · Accepted Answer · edited Aug 03 '15 at 18:52

4

You use the DocumentText property or the WebBrowser control.

This property is what holds the HTML of the site you have navigated to.

Update: (following comments)

If you want to parse the HTML and get the text parts of it, I suggest you use the HTML Agility Pack.

edited Aug 03 '15 at 18:52

Max von Hippel

2,856
3
29
46

answered Apr 14 '10 at 12:15

Oded

489,969
99
883
1,009

oded, i do not want to look at the html, i only want to look at the text that the user sees – Alex Gordon Apr 14 '10 at 12:20
1

no i dont want to parse the html, i just want the same result as if you hit ctrl A and copy and paste all the text – Alex Gordon Apr 14 '10 at 12:25
1

Um. That's what you will get by _parsing_ the HTML. – Oded Apr 14 '10 at 12:30

Copy all text from webbrowser control

3 Answers3

Linked