-1

I am trying to find all the comments in a web page.

import requests

with requests.session() as r:
    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:88.0) Gecko/20100101 Firefox/88.0'}
    r = requests.get('https://www.example.com', verify=False, headers=headers)
    print(r)

This script returns all the source code of the page. However, I am only interested in finding the commented lines. Can anyone help with me with a regular expression to find the commented lines. Or is there a better method to finding this?

  • 2
    [Don't use regex to parse HTML](https://stackoverflow.com/a/1732454/4046632). Look at package like [BeautifulSoup](https://pypi.org/project/beautifulsoup4/). When you come with some code and have a problem you cannot solve - ask again. Now we can do little to help without even knowing the site you try to scrape. A good tutorial on scrapping would help. – buran Apr 20 '21 at 05:17

1 Answers1

0

You might try BeautifulSoup4, which has built-in function for identifying comments.

Here's one StackOverflow that demonstrates this: How to find all comments with Beautiful Soup

astrochun
  • 1,642
  • 2
  • 7
  • 18