8

I'm trying to learn the Python version of Playwright. See here

I would like to learn how to locate an element, so that I can do things with it. Like printing the inner HTML, clicking on it and such.

The example below loads a page and prints the HTML

from playwright import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch(headless=False)
    page = browser.newPage()
    page.goto('http://whatsmyuseragent.org/')
    print(page.innerHTML("*"))
    browser.close()

This page contains an element

<div class="user-agent">
    <p class="intro-text">Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4238.0 Safari/537.36</p>
</div>

Using Selenium, I could locate the element and print it's content like this

elem = driver.find_element_by_class_name("user-agent")
print(elem)
print(elem.get_attribute("innerHTML"))

How can I do the same in Playwright?

#UPDATE# - Note if you want to run this in 2021+ that current versions of playwright have changed the syntax from CamelCase to snake_case.

576i
  • 7,579
  • 12
  • 55
  • 92

5 Answers5

9

The accepted answer does not work with the newer versions of Playwright. (Thanks @576i for pointing this out)

Here is the Python code that works with the newer versions (tested with version 1.5):

from playwright.sync_api import sync_playwright

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto('http://whatsmyuseragent.org/')
    ua = page.query_selector(".user-agent");
    print(ua.inner_html())
    browser.close()

To get only the text, use the inner_text() function.

print(ua.inner_text())
Upendra
  • 716
  • 9
  • 17
  • At the time of writing the question, the accepted answer was working Python code. Since then, playwright has changed a bit and now uses a more Python friendly notation. (This is the case for many other older playwright questions) – 576i Sep 27 '21 at 08:12
  • Got it. Thanks. I am editing this answer to reflect this. – Upendra Sep 27 '21 at 10:37
6

You can use the querySelector function, and then call the innerHTML function:

handle = page.querySelector(".user-agent")
print(handle.innerHTML())
hardkoded
  • 18,915
  • 3
  • 52
  • 64
  • 1
    `AttributeError: 'Page' object has no attribute 'querySelector'` – étale-cohomology Jul 23 '21 at 05:57
  • 3
    In Python, it would be `page.query_selector(".user-agent)` – Upendra Sep 27 '21 at 07:36
  • 2
    Note that playwright for python has changed the syntax from `querySelector` to `query_selector` for the newer versions... If you find other older, no-longer-working playwright answers, this is probably why. – 576i Sep 27 '21 at 08:15
2

according to Latest official python version Playwright, you should use:

-> the code:

# userAgentSelector = ".user-agent"
userAgentSelector = "div.user-agent"
elementHandle = page.query_selector(userAgentSelector)
uaHtml = elementHandle.inner_html()
print("uaHtml=%s" % uaHtml)
crifan
  • 12,947
  • 1
  • 71
  • 56
0

I think you can find the solutions in the following article. Playwright >> Find, Locate, Select Elements/Tags using Playwright

  • Playwright find all elements/tags containing specified text
  • Playwright find elements/tags containing specified child element/tag
  • Playwright loop through all elements/tags in locator() result
  • Playwright find/get first element Playwright find/get last element
  • Playwright get the parent element Playwright get the child element
  • Playwright get nth child element Playwright find elements/tags by css
  • class Playwright find elements near the specified text Playwright
  • find elements/tags by attribute Playwright find elements/tags by id
0

Existing answers are a bit outdated. Nowadays the locator API is recommended since auto-waiting is the common case:

from playwright.sync_api import sync_playwright  # 1.37.0

with sync_playwright() as p:
    browser = p.chromium.launch()
    page = browser.new_page()
    page.goto("https://www.example.com")
    text = page.locator("h1").text_content()
    print(text)
    browser.close()

Use query_selector when you don't want to wait and instead want an immediate None if the element isn't visible.

Note that http://whatsmyuseragent.org is down, so I used a different site, but it's basically the same.

ggorlen
  • 44,755
  • 7
  • 76
  • 106