2

I'm trying to extract the OTP text from HTML where it's inside a table row and there are no selector like div id or class. Only div style is present. How to copy text from that.

I'm using https://temp-mail.org/

Here is the XPATH of the OTP Field

/html/body/main/div[1]/div/div[3]/div[2]/div/div[1]/div/div[2]/div[3]/div[2]/div/table/tbody/tr/td/div[2]/table/tbody/tr/td/div/table/tbody/tr/td/table/tbody/tr[3]/td/div

Here is the Selector

body > main > div.container > div > div.col-sm-12.col-md-12.col-lg-12.col-xl-8 > div.tm-content > div > div.inboxWarpMain > div > div.inbox-data-content > div.inbox-data-content-intro > div:nth-child(13) > div > table > tbody > tr > td > div:nth-child(2) > table > tbody > tr > td > div > table > tbody > tr > td > table > tbody > tr:nth-child(3) > td > div

Code Structure Image Showing in Dev Tools

1 Answers1

1

Always make sure that you don't violate the Terms of Use of the actually scraped service. Maybe you could achieve the desired result if you'd use their API? (https://rapidapi.com/Privatix/api/temp-mail)

If you are sure that you want to use browser automation and proceed with retrieveing the one-time password with puppeteer then: you can use page.$eval method to retrieve text content of any element with a valid selector.

Note: what you've already copied from devtools as the selector is actually a selector, it is not mandatory to include a CSS class or element id. It is totally fine (even if it is a bit redundant).

E.g.:

const selector = 'body > main > div.container > div > div.col-sm-12.col-md-12.col-lg-12.col-xl-8 > div.tm-content > div > div.inboxWarpMain > div > div.inbox-data-content > div.inbox-data-content-intro > div:nth-child(13) > div > table > tbody > tr > td > div:nth-child(2) > table > tbody > tr > td > div > table > tbody > tr > td > table > tbody > tr:nth-child(3) > td > div'

const text = await page.$eval(selector, el => el.innerText)
console.log(text)

Output:

233-552

Edit

In case there are more than one elements the selector would match, you can use document.querySelectorAll approaches like $$eval or $$ then select the element on the first index [0].

In this exact use case the $ is occupied by jQuery, so it conflicts with chrome api's $ shorthand for querySelector, see here:

enter image description here

Solutions:

const selector = 'body > main > div.container > div > div.col-sm-12.col-md-12.col-lg-12.col-xl-8 > div.tm-content > div > div.inboxWarpMain > div > div.inbox-data-content > div.inbox-data-content-intro > div:nth-child(13) > div > table > tbody > tr > td > div:nth-child(2) > table > tbody > tr > td > div > table > tbody > tr > td > table > tbody > tr:nth-child(3) > td > div'

await page.waitFor(10000) // waitForTimeout since pptr 5.3.0

try {
  await page.waitForSelector(selector)
  const [text] = await page.$$eval(selector, elements => elements.map(el => el.innerText))
  console.log(text)
} catch (e) {
  console.error(e)
}
// alternate solution with page.evaluate:
try {
  const text = await page.evaluate(el => el.innerText, (await page.$$(selector))[0])
  console.log(text)
} catch (e) {
  console.error(e)
}
theDavidBarton
  • 7,643
  • 4
  • 24
  • 51
  • It's showing this error https://pastebin.com/11fg5zQD – Missy Maxwell Sep 29 '20 at 15:52
  • Here is the complete code. https://pastebin.com/MksY7s5n – Missy Maxwell Sep 29 '20 at 15:59
  • can you tell on which exact command the error is thrown? also: could you wrap the content of the function in a `try...catch` block? I think there will be a need for a `page.waitForSelector` before the `$eval` as the click you perform is followed by a navigation on the page. I tried to run your code, but was failed at line `22`: `await el[0].click();` due to link not found. maybe the selectors change over time, that should be solved as well to make it work with automation. – theDavidBarton Sep 29 '20 at 17:22
  • Hi, I updated code with example site and console log. The code stops at "Copying OTP" https://pastebin.com/GyqxnCDE – Missy Maxwell Sep 29 '20 at 17:58
  • see my update Missy. the `page.$eval` didn't work: (1) it requires a `page.waitForSelector`, (2) there was a conflict between the page that uses jQuery and chrome api, that requires a bit workaround, see in the updated answer. – theDavidBarton Sep 29 '20 at 19:39
  • Thanks David, Is there any way we can copy the OTP to clipboard instead of showing in console(log) ? Possible? – Missy Maxwell Sep 29 '20 at 23:05
  • The advantage of saving it as a variable (showing on the console.log is just tfor debugging) that we are able to type it into the required input as a real user would do it if want to fill an input (ca be done with puppeteer's `page.keyboard.type`). if you'd prefer real copy-paste, then there are certain methods, including the [clipboardy](https://www.npmjs.com/package/clipboardy) npm package. https://stackoverflow.com/questions/57101467/how-do-you-paste-text-using-puppeteer, https://stackoverflow.com/questions/49131516/how-to-copy-text-from-browser-clipboard-using-puppeteer-in-nodejs – theDavidBarton Sep 30 '20 at 07:05