I want to download a doc file in a website throw python spider. I have the file url, that means the file will be downloaded automatically when I put input the url in the browser after I login. If I did not login, it will return a 404 error. I only konw urllib.urlretrieve(url, 'path/filename')
can download, but I do not know how to simulate into login state using urlretrieve. Or is there any other ways to download it? Help me please, thanks.
Asked
Active
Viewed 94 times
0

thiiiiiking
- 1,233
- 3
- 12
- 16
-
Try using requests for a simple solution: http://stackoverflow.com/a/17633072/4131059 Use requests.Session to make a session, and then you can post the request. – Alex Huszagh Dec 07 '15 at 00:32
-
@AlexanderHuszagh I will try it, thanks very much – thiiiiiking Dec 07 '15 at 01:18
1 Answers
0
maybe you can try grab
framework(others can do so, this is just an example), it's easy to fill in the input and submit:
from grab import Grab
import logging
logging.basicConfig(level=logging.DEBUG)
g = Grab()
g.go('https://github.com/login')
g.set_input('login', '***')
g.set_input('password', '***')
g.submit()
then you can download your doc files.

Sinux
- 1,728
- 3
- 15
- 28