-2

There is a page I want to scrape, you can pass it variables in the URL and it generates specific content. All the content is in a giant HTML table.

I am looking for a way to write a script that can go through 180 of these different pages, extract specific information from certain columns in the table, do some math, and then write them to a .csv file. That way I can do further analysis myself on the data.

What is the easiest way to scrape webpages, parse HTML and then store the data to a .csv file?

I have done stuff similar in python and PHP, the parsing of HTML is not the most easiest thing to do, or cleanest. Are there other routes that are easier?

Reily Bourne
  • 5,117
  • 9
  • 30
  • 41
  • Web scraping is **not data-mining**. It's at most "information extraction". or, well, web scraping. Please don't overtag everything as "data mining" that doesn't include databases and analysis... – Has QUIT--Anony-Mousse Mar 21 '12 at 20:56
  • This is a pretty idiosyncratic question, because your personal skill with different languages is going to make a big difference here - if you're a Python expert, than Python-based tools are going to be easier. You could make the question more useful to yourself and others by specifying the language you want to use. – nrabinowitz Mar 22 '12 at 17:03

1 Answers1

1

If you have some experience with python, I would recommend something like BeautifulSoup, or in PHP you can use PhPQuery.

Once you know how to use the HTML-parser, then you can create a "pipes-and-filter" program to do the math and dump it to a csv file.

Have a look at this question for more info on a Python solution.

Community
  • 1
  • 1
ebaxt
  • 8,287
  • 1
  • 34
  • 36