-2

I have been working on a program that scrapes data from a particular page of a website using regular expression in PHP.

     <?php 
     ini_set("user_agent", "PHP");
     $url = "http://www.example.com/page.html";
     $output = file_get_contents($url);
     preg_match('#<h1 class="title" itemprop="name">(.*)</h1>#', $output, $match);
     echo $match[1] ."<br>";
     ?>

How do I make a program that gets all the existing links of the website to scrape the data from? Instead of opening every link in the browser and inserting it manually, which is worse then typing the data manually instead of scraping.

I know JavaScript, Python and PHP and can work on any of these three languages.

Sumit
  • 486
  • 4
  • 16

1 Answers1

0
import bs4
for link in bs4.BeautifulSoup(urllib2.urlopen(target_url).read()).find_all("a"):
    print link
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179