auto download pdf file after multiple HTTP redirect

Asked Apr 14 '15 at 10:07

Active Apr 15 '15 at 06:20

Viewed 376 times

A/38/7/CORR.1(SUPP) I want to download all the pdf files in this page: [http://search.un.org/?query=A&searchTrigger=%E6%90%9C%E7%B4%A2+ODS&SS=DS&tpl=ods&lang=zh-cn]

One sample link is : A/38/7/CORR.1(SUPP) . This link will redirect two times to the real pdf url, and cookie is needed, it first refresh to a tmp url(change every time), through : <META HTTP-EQUIV="refresh" CONTENT="0; URL=/TMP/625508.055090904.html">, and then the page refresh to a real url, through:<META HTTP-EQUIV="refresh" CONTENT="1; URL=http://daccess-dds-ny.un.org/doc/UNDOC/GEN/N83/368/31/PDF/N8336831.pdf?OpenElement">

This can easily be done in browser, when I try to batch download use wget or python, it seem impossible.

wget: I can't get the tmp url, from the orginal url, even with --load-cookie option

python: I have tried urllib, urllib2 and mechanize, I can't handle the auto refresh, can't get the real url

Is there any body have some cue ? Thank u very much.

edited Apr 15 '15 at 06:20

asked Apr 14 '15 at 10:07

deepblue

auto download pdf file after multiple HTTP redirect

0 Answers0