1
<div class=" arrange-unit__09f24__rqHTg arrange-unit-fill__09f24__CUubG  border-color--default__09f24__NPAKY">
<p class=" css-na3oda">Business website</p>
<p class=" css-1p9ibgf" data-font-weight="semibold">
<a href="/biz_redir?url=http%3A%2F%2FSouthalabamaconstruction.com&amp;cachebuster=1677298534&amp;website_link_type=website&amp;src_bizid=S8nQqc5JRUn9q-HFI0x8kA&amp;s=76c44f0c24c6e853246a79bc1ceb3260cde63054f3a223e5dd725bd2146bc5f5" class="css-1um3nx" target="_blank" rel="noopener" role="link">http://Southalabamaconstructio…</a>
</p></div>

I'm trying to use playwright to get the a href link in here and not succeeding. Is there a way to possibly try to find the <p> element that has Business website in it and then go two elements below to get the <a> element?

I think this is the best way I'm just not sure how to implement it.

I need the vlaue of the href from the a element. The text inside doesn't have the full name for the href link.

await page.locator('a[rel="noopener"]').nth(1).innerHTML()
await page.locator('div > p > a').nth(1).innerHTML()
await page
  .locator('div:has-text("Business website") > a')
  .nth(1)
  .innerHTML()
await page.getByRole('link', { name: /^(http|https):/i })
await page.getByText(/^(http|https):/i).innerHTML()

I tried a plethora of other stuff with either errors, maximum calls, or getting the wrong link.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
Vonkoff
  • 29
  • 2
  • @Vonkoff please ask one question per question--that said, you can use [CSS syntax to select multiple attributes](https://stackoverflow.com/questions/12340737/specify-multiple-attribute-selectors-in-css). I'll remove that bit to keep this question focused. Thanks. – ggorlen Feb 27 '23 at 02:45

2 Answers2

1

You're pretty close. The problem is that the <a> isn't a direct child of the <div>, so skip the > combinator:

const playwright = require("playwright"); // ^1.30.1

const html = `<!DOCTYPE html>
<div class=" arrange-unit__09f24__rqHTg arrange-unit-fill__09f24__CUubG  border-color--default__09f24__NPAKY">
<p class=" css-na3oda">Business website</p>
<p class=" css-1p9ibgf" data-font-weight="semibold">
<a href="/biz_redir?url=http%3A%2F%2FSouthalabamaconstruction.com&amp;cachebuster=1677298534&amp;website_link_type=website&amp;src_bizid=S8nQqc5JRUn9q-HFI0x8kA&amp;s=76c44f0c24c6e853246a79bc1ceb3260cde63054f3a223e5dd725bd2146bc5f5" class="css-1um3nx" target="_blank" rel="noopener" role="link">http://Southalabamaconstructio…</a>
</p></div>
`;

let browser;
(async () => {
  browser = await playwright.chromium.launch();
  const page = await browser.newPage();
  await page.setContent(html);
  const loc = page.locator('div:has-text("Business website") a');
  console.log(await loc.textContent()); // => http://Southalabamaconstructio…
  console.log(await loc.getAttribute("href")); // => /biz_redir?url=http%3A%2F%2FSouthalabamac...
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

You may wish to strengthen your condition as follows:

'div:has(p:has-text("Business website")) a'

I'm not sure if the CSS classes are stable, but those are generally better to use than tags.

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • 1
    Just to clarify, that condition strengthening piece looks the same as what you put in the bigger block, and then you mention CSS classes compared to tags, was that as you intended? – David R Feb 27 '23 at 04:37
  • Nope, copy-paste error. Fixed. The reason I'm not using CSS classes for selection is because I can't tell if they're generated or not. Just noting that if they are stable, they may be a better choice than tags. – ggorlen Feb 27 '23 at 06:25
  • 1
    That makes sense. The whole looking the same aspect was what really created the confusion, so thanks for updating that! – David R Feb 27 '23 at 14:37
0

The link has text that starts with http: or https: so use RegEx to match that. Then I get the attribute value with the last line.

await page
.getByRole('link', { name: /^(http|https):/i })
.getAttribute('href')
Solomon Ucko
  • 5,724
  • 3
  • 24
  • 45
Vonkoff
  • 29
  • 2