-1

Every day I need to open a webpage, copy the text on the page and paste it into an Excel file. Is there a way that I can automate this process using Python, without bothering to open a web browser?

thanks for friends who provided the answer. would it be possible to show me an example?

thanks.

  • 3
    If the question is can you do this then the answer is yes, but the point of SO isn't to get other people to do the work for you. – Noelkd May 31 '13 at 11:06

5 Answers5

1

Sure, simply use urllib2 to open your webpage, then have a look at the content with BeautifulSoup and then just stick that data into the Excel file with xlwt. Easy!

danodonovan
  • 19,636
  • 10
  • 70
  • 78
1

You could use a technique called web scraping; there is even an open source framework written in python called scrapy which is specifically written for crawling and screen scraping.

Just do a google search with a search phrase such as; "web scraping using python" this should be enough to get you started on your way.

There is some good information in the following post; Anyone know of a good Python based web crawler that I could use?

Community
  • 1
  • 1
Nishan
  • 157
  • 1
  • 12
1

Yes, you can do this.

I would suggest:

  • Read up on urllib and urllib2 for getting the page in python.
  • Investigate lxml for parsing the content from your page.
  • Take a look at this page on python excel manipulation.
  • Attempt to write some code to do what you wish.
  • If you don't succeed immediately then ask for some help and provide code examples.

Good luck

Chris Clarke
  • 2,103
  • 2
  • 14
  • 19
1

You can do the same in excel itself at a small level (importing data to Excel from the web). From the Excel Ribbon select 'Data' > 'From Web. If you are bent upon using python try https://datanitro.com/ . Datanitro is an excellent python-excel integration. Here is a demo http://scriptogr.am/richie/post/python-for-excel-using-datanitro

richie
  • 17,568
  • 19
  • 51
  • 70
0

Yes, there is. You need to use urllib2 to pull the HTML from the web, then you need to parse the HTML for the values you need (module BeautifulSoup and regex), and finally to save the result as CSV file, which can be opened in Excel

Iliyan Bobev
  • 3,070
  • 2
  • 20
  • 24