2

I am trying to write some code to detect the login form for any website (basically trying to recreate something like 1Password where I can detect the username/password field). I wanted to figure out which inputs are the right ones to input based on the attributes of the elements. When using puppeteer on sample html like:

<form class="">
  <input type="text" id="id1" name="some name" placeholder="some placeholder" value="">
  <input type="text" id="id2" attr1="some name" attr2="some placeholder" value="">
  <input type="text" attr3="some name" attr4="some placeholder" value="">
</form>

code for Puppeteer is:

const inputHandles = await page.$$('input')
inputHandles.forEach(async(element) => {
    // This actually works and gives me the id of each of the elements
    const jsHandle = await (await element.getProperty('id')).jsonValue();
    // This always returns {}
    console.log(await element.getProperties())
})

I was wondering how I would be able to dynamically get the attributes of each of the inputs without querying for id/name/placeholder specifically. I also saw that the same issue as posted on github (https://github.com/puppeteer/puppeteer/issues/4995); however, it was closed for inactivity. Thanks!

ggorlen
  • 44,755
  • 7
  • 76
  • 106
  • Important: [Using async/await with a forEach loop](https://stackoverflow.com/questions/37576685/using-async-await-with-a-foreach-loop) – ggorlen Apr 30 '23 at 17:26

1 Answers1

1

You can get attribute names and their values either using getAttributeNames() and getAttribute() or using Object.fromEntries() like in the code below.

const puppeteer = require("puppeteer");

const html = `
    <form class="">
        <input type="text" id="id1" name="some name" placeholder="some placeholder" value="">
        <input type="text" id="id2" attr1="some name" attr2="some placeholder" value="">
        <input type="text" attr3="some name" attr4="some placeholder" value="">
    </form>
`;

let browser;
(async () => {
  browser = await puppeteer.launch();
  const [page] = await browser.pages();
  await page.setContent(html);

  // first variant
  // get attributes into an array of objects of the attribute names and values of the inputs [[{key,value},...],...]
  let inputs = await page.$$eval("input", el =>
    el.map(x =>
      x.getAttributeNames().reduce((acc, name) => {
        return {...acc, [name]: x.getAttribute(name)};
      }, {})
    )
  );
  inputs = inputs.map(x =>
    Object.keys(x).map(k => ({attribute: k, value: x[k]}))
  ); // convert to [[{ attribute : <name>,  value : <value>},..],...] format (if required)
  console.log(inputs);

  // faster variant
  // get attributes into an array of objects of the attribute names and values of the inputs [[{key,value},...],...]
  inputs = await page.$$eval("input", el =>
    el.map(x =>
      Object.fromEntries(
        [...x.attributes].map(attr => [attr.name, attr.value])
      )
    )
  );
  inputs = inputs.map(x =>
    Object.entries(x).map(([attribute, value]) => ({
      attribute,
      value,
    }))
  ); // convert to [[{ attribute : <name>,  value : <value>},..],...] format (if required)
  console.log(inputs);
})()
  .catch(err => console.error(err))
  .finally(() => browser?.close());

P.S. I required something similar a few days ago. Idea source: blog

ggorlen
  • 44,755
  • 7
  • 76
  • 106
idchi
  • 761
  • 1
  • 5
  • 15