0

I'm trying to archive a webpage by the URL. the idea is to archive a website into a single HTML file with all the assets in it. when the user provides URL, I'm using fetch to get HTML content from the webpage of the URL. however I wanted to replace dependent asset paths(ie CSS files paths, href, file/URL paths) to user-provided URL. so that when I open archived HTML file I can see page rendered properly with all the images, links, etc.

Here's what I'm trying:

const response = await fetch(url);
const html = await response.text();

  // replace hrefs with absolute urls
  const newHTML = html.replaceAll(/href="\//g, 'href="https://example.com/');
  fs.writeFile('output2.html', newHTML, (err) => {
    if (err) throw err;
    console.log('The file has been saved!');
  });

I need help with finding the proper regexp in order to make it work. or any other way around. the idea is to archive a website into a single HTML file with all the assets in it.

ajay
  • 328
  • 1
  • 5
  • 17
  • 1
    https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454 – Quentin Nov 14 '22 at 09:15
  • Does this answer your question? [How do I parse a HTML page with Node.js](https://stackoverflow.com/questions/7372972/how-do-i-parse-a-html-page-with-node-js) – traynor Nov 14 '22 at 10:04
  • also see examples like [How to download an entire website given a domain name](https://stackoverflow.com/questions/13031147/how-to-download-an-entire-website-given-a-domain-name) – traynor Nov 14 '22 at 10:07

0 Answers0