7

Using PuppeteerSharp, I am trying to get the text of the element.

ElementHandle elementHandle = await page.XPathAsync("//html/body/div[1]/section/div/section/h2")[0];

Now that I have the element handle, how do I actually get the text from it? I don't see any obvious methods. I would have expected TextAsync or something similar, but I don't see it.

Using PuppeteerSharp 5.0.

hardkoded
  • 18,915
  • 3
  • 52
  • 64
AngryHacker
  • 59,598
  • 102
  • 325
  • 594
  • 1
    There is a [GetPropertyAsync(String)](https://www.puppeteersharp.com/api/PuppeteerSharp.JSHandle.html#PuppeteerSharp_JSHandle_GetPropertyAsync_System_String_) method, please see [How to read the value of an span element with Puppeteer](https://stackoverflow.com/questions/51307615/how-to-read-the-value-of-an-span-element-with-puppeteer), [Getting a Selector's value in Puppeteer](https://stackoverflow.com/questions/52899557/getting-a-selectors-value-in-puppeteer). – Botan Aug 04 '21 at 22:15
  • 2
    @Botan Thanks. That did the trick. I did `var foo = await elementHandle.GetPropertyAsync("innerText"); `, and then `foo.ToString()` has what I need. – AngryHacker Aug 04 '21 at 23:02

3 Answers3

8

You can call EvaluateFunction passing that ElementHandle as an argument

var content = await Page.EvaluateFunctionAsync<string>("e => e.textContent", elementHandle);

If you have many scenarios like that, you can build an extension method to solve that for you ;)

hardkoded
  • 18,915
  • 3
  • 52
  • 64
4

@Botan, thank you! I have tried (in VB.NET) and found:

(Await elementhandle.GetPropertyAsync("innerText")).ToString

result: "JSHandle:foo", but

(Await elementhandle.GetPropertyAsync("innerText")).RemoteObject.Value.ToString

result: "foo"

1

If your are after a strongly typed API for use with Puppeteer Sharp then you can use PuppeteerSharp.Dom which is available on Nuget.org.

// Add using PuppeteerSharp.Dom to access the extension methods

ElementHandle elementHandle = await page.XPathAsync("//html/body/div[1]/section/div/section/h2")[0];
// Create a strongly typed HtmlHeadingElement object
var headingElement = elementHandle.ToDomHandle<HtmlHeadingElement>();
// You'll now have context specific methods relevant to HtmlHeadingElement
//Get TextContent via the async method
var textContext = await headingElement.GetTextContentAsync();
var innerText = await headingElement.GetInnerTextAsync();

There's a number of QuerySelector extension methods also, so you can avoid the ToDomHandle method if you are using a query selector.

var element = await page.QuerySelectorAsync<HtmlElement>("#myElementId");

There are more examples on the GitHub page.

amaitland
  • 4,073
  • 3
  • 25
  • 63