I am making web scraping site, and I want get Tags in URL , but they are dynamic sources.
so I can't touch only Cheerio. people recommended Puppeteer. and my problem was starting
first. I could see Module not found:
Error: Can't resolve 'https' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'
and also they couldn't find out os, path .....
so I add (I use yarn) webpack and cli
second. I set the webpack.config.js for fallback
resolve:{
fallback:{
"fs":false,
"os": require.resolve("os-browserify/browser"),
"path": require.resolve("path-browserify"),
"https": require.resolve("https-browserify"),
"stream": false,
"zlib": false ,
"crypto": false,
"constants": false,
}
}
because the Err-Message said
BREAKING CHANGE: webpack < 5 used to include polyfills for node.js core modules by default.
This is no longer the case. Verify if you need this module and configure a polyfill for it.
If you want to include a polyfill, you need to:
- add a fallback 'resolve.fallback: { "https": require.resolve("https-browserify") }'
- install 'https-browserify'
If you don't want to include a polyfill, you can use an empty module like this:
resolve.fallback: { "https": false }
but the err messages still there when I yarn start
Third. I thought if the config didn't set .
so I did ' $ webpack --config webpack.config.js
'
I couldn't see the err
but still when I did yarn start, problem are there
4th. I add fs, os, http..... (in the err's module name) using yarn I can see the dependencies
"os": "^0.1.2", "path": "^0.12.7",
and added
"browser": {
"crypto": false,
"fs": false,
"path": false,
"os": false,
"net": false,
"stream": false,
"tls": false
}
setting in package.json
but,, . . .
ERROR in ./node_modules/puppeteer-core/lib/cjs/puppeteer/node/FirefoxLauncher.js 43:29-42
Module not found: Error: Can't resolve 'fs' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'
ERROR in ./node_modules/puppeteer-core/lib/cjs/puppeteer/node/ProductLauncher.js 65:13-26
Module not found: Error: Can't resolve 'fs' in '/Users/Documents/myMac/Study/bookMarks/node_modules/puppeteer-core/lib/cjs/puppeteer/node'
webpack compiled with 41 errors
I am having 41 errors
5th . I removed folder the node_modules and yarn.lock
and did
$ yarn cache clean $ yarn install
it doesn't work
also I removed puppeteer-core and re-add
and i have 41 errors still
do you have another way or can I alternate puppeteer?
at last this is js module using puppeteer
const puppeteer = require('puppeteer-core');
const DomParser = require('dom-parser');
async function getTagList(url) {
const tagListText = new Array();
try{
const browser = await puppeteer.launch();
const page = await browser.newPage();
await page.goto(url);
const html = await page.content();
const parser = new DomParser();
const dom = parser.parseFromString(html);
const tagList = dom.getElementsByClassName('tag_area')[0].getElementsByTagName('a');
tagListText = Array.from(tagList).map(tag => tag.textContent);
await browser.close();
}catch(error) {
console.error(error);
}
return tagListText;
}
module.exports = { getTagList };
and I am using chatGPT. he recommended setting in webpack.config.js Specially fallback -> fallbacks and it can't terminal said fallbacks isn't option
I use webpack5