1

I am making web scraping site, and I want get Tags in URL , but they are dynamic sources.

so I can't touch only Cheerio. people recommended Puppeteer. and my problem was starting

first. I could see Module not found:

Error: Can't resolve 'https' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'

and also they couldn't find out os, path .....

so I add (I use yarn) webpack and cli

second. I set the webpack.config.js for fallback

    resolve:{
        fallback:{
            "fs":false,
            "os": require.resolve("os-browserify/browser"),
            "path": require.resolve("path-browserify"),
            "https": require.resolve("https-browserify"),
            "stream": false,
            "zlib": false ,
            "crypto": false,
            "constants": false,
        }
}

because the Err-Message said

BREAKING CHANGE: webpack < 5 used to include polyfills for node.js core modules by default.
This is no longer the case. Verify if you need this module and configure a polyfill for it.

If you want to include a polyfill, you need to:
        - add a fallback 'resolve.fallback: { "https": require.resolve("https-browserify") }'
        - install 'https-browserify'
If you don't want to include a polyfill, you can use an empty module like this:
        resolve.fallback: { "https": false }

but the err messages still there when I yarn start

Third. I thought if the config didn't set . so I did ' $ webpack --config webpack.config.js' I couldn't see the err

but still when I did yarn start, problem are there

4th. I add fs, os, http..... (in the err's module name) using yarn I can see the dependencies

"os": "^0.1.2", "path": "^0.12.7",

and added

  "browser": {
    "crypto": false,
    "fs": false,
    "path": false,
    "os": false,
    "net": false,
    "stream": false,
    "tls": false
  }

setting in package.json

but,, . . .

ERROR in ./node_modules/puppeteer-core/lib/cjs/puppeteer/node/FirefoxLauncher.js 43:29-42
Module not found: Error: Can't resolve 'fs' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'

ERROR in ./node_modules/puppeteer-core/lib/cjs/puppeteer/node/ProductLauncher.js 65:13-26
Module not found: Error: Can't resolve 'fs' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'

webpack compiled with 41 errors

I am having 41 errors

5th . I removed folder the node_modules and yarn.lock and did $ yarn cache clean $ yarn install

it doesn't work

also I removed puppeteer-core and re-add

and i have 41 errors still

do you have another way or can I alternate puppeteer?

at last this is js module using puppeteer

const puppeteer = require('puppeteer-core');
const DomParser = require('dom-parser');

async function getTagList(url) {
  const tagListText = new Array();
  try{
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url);

    const html = await page.content();
    const parser = new DomParser();
    const dom = parser.parseFromString(html);
    const tagList = dom.getElementsByClassName('tag_area')[0].getElementsByTagName('a');
    tagListText = Array.from(tagList).map(tag => tag.textContent);

    await browser.close();
  }catch(error) {
    console.error(error);
  }

  return tagListText;
}

module.exports = { getTagList };

and I am using chatGPT. he recommended setting in webpack.config.js Specially fallback -> fallbacks and it can't terminal said fallbacks isn't option

I use webpack5

jujujamong
  • 39
  • 2

1 Answers1

0

I found out the reason.

I don't use a node server. if I want use puppeteer, I need to use node server

jujujamong
  • 39
  • 2