A/38/7/CORR.1(SUPP) I want to download all the pdf files in this page: [http://search.un.org/?query=A&searchTrigger=%E6%90%9C%E7%B4%A2+ODS&SS=DS&tpl=ods&lang=zh-cn]
One sample link is : A/38/7/CORR.1(SUPP) . This link will redirect two times to the real pdf url, and cookie is needed, it first refresh to a tmp url(change every time), through : <META HTTP-EQUIV="refresh" CONTENT="0; URL=/TMP/625508.055090904.html">
, and then the page refresh to a real url, through:<META HTTP-EQUIV="refresh" CONTENT="1; URL=http://daccess-dds-ny.un.org/doc/UNDOC/GEN/N83/368/31/PDF/N8336831.pdf?OpenElement">
This can easily be done in browser, when I try to batch download use wget or python, it seem impossible.
wget: I can't get the tmp url, from the orginal url, even with --load-cookie option
python: I have tried urllib, urllib2 and mechanize, I can't handle the auto refresh, can't get the real url
Is there any body have some cue ? Thank u very much.