i have to get many urls from a website and then i've to copy these in an excel file. I'm looking for an automatic way to do that. The website is structured having a main page with about 300 links and inside of each link there are 2 or 3 links that are interesting for me. Any suggestions ?
4 Answers
If you want to develop your solution in Python then I can recommend Scrapy framework.
As far as inserting the data into an Excel sheet is concerned, there are ways to do it directly, see for example here: Insert row into Excel spreadsheet using openpyxl in Python , but you can also write the data into a CSV file and then import it into Excel.
If the links are in the html... You can use beautiful soup. This has worked for me in the past.
import urllib2
from bs4 import BeautifulSoup
page = 'http://yourUrl.com'
opened = urllib2.urlopen(page)
soup = BeautifulSoup(opened)
for link in soup.find_all('a'):
print (link.get('href'))

- 392
- 1
- 4
- 18
have you tried selenium or urllib?.urllib is faster than selenium http://useful-snippets.blogspot.in/2012/02/simple-website-crawler-with-selenium.html

- 2,581
- 1
- 11
- 7
You can use beautiful soup for parsing , [http://www.crummy.com/software/BeautifulSoup/]
More information about docs here http://www.crummy.com/software/BeautifulSoup/bs4/doc/
I won't suggest scrappy because you don't need that for work you described in your question.
For e.g. this code will use urllib2 library to open a google homepage and find all links in that output in the form of list
import urllib2
from bs4 import BeautifulSoup
data=urllib2.urlopen('http://www.google.com').read()
soup=BeautifulSoup(data)
print soup.find_all('a')
For handling excel files take a look at http://www.python-excel.org

- 5,649
- 3
- 23
- 42