0

I would like to do web scraping on this site (stackoverflow.com), I was wondering if there was an API or some other tool that can be used with Python to get all the comments containing a specific tag.

For example, how do I get all the posts and comments from 10/01/2019 to 01/20/2019 with the python tag?

Vadim Kotov
  • 8,084
  • 8
  • 48
  • 62
HABLOH
  • 460
  • 2
  • 12

1 Answers1

4

Have a detailed look at https://api.stackexchange.com/docs/

You can get all questions from a start date to an end date with a particular tag by making use of the questions method. You need to pass the specific tag into the tagged parameter.

Here is the URL format for that:
https://api.stackexchange.com/2.2/questions?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&tagged={tag}&site=stackoverflow

For example the below link returns all questions from 1st July, 2019 to 5th July, 2019 with tag python:
https://api.stackexchange.com/2.2/questions?fromdate=1561939200&todate=1562284800&order=desc&sort=activity&tagged=python&site=stackoverflow

For more information on how the date has been formatted in the above URL, you can have a look at dates.

Now that you have the question_id, you can make use of questions/{ids}/answers method to get all answers of that question from a start date to an end date.

Here is the URL format for that:
https://api.stackexchange.com/2.2/questions/{question_id}/answers?fromdate={start_date}&todate={end_date}&order=desc&sort=activity&site=stackoverflow

For example the below link returns all answers from 1st January, 2019 to 1st July, 2019 to question with question_id 37181281:
https://api.stackexchange.com/2.2/questions/37181281/answers?fromdate=1546300800&todate=1561939200&order=desc&sort=activity&site=stackoverflow

Now you basically have all the posts(questions and answers) from a start date to an end date with a particular tag.

Since, you have the question_id and answer_id for the posts, you can make use of questions/{ids}/comments method and answers/{ids}/comments method to get the comments on these posts.

Rounak
  • 806
  • 7
  • 16
  • thank you so much! If I can have your availability I would love to ask you one last question, if I wanted to get a lot of data included in different years. For example, from 2016 to 2019, how could I scroll through the pages and increase the maximum output per page? – HABLOH Jul 15 '19 at 13:08