0

This is what I was using to download a pdf from a website. When I don't combine cd.... && part, the curl launches and downloads the file. But, whenever I use the cd command to change the directory and download the file, it just passes the curl command. I don't want to provide -o argument to curl, since I'm not willing to provide custom name to file. Please, suggest the cause of this problem and solution.

The question is unique in the sense that it asks for implementation of curl with bash command. The suggested thread is regarding bash command only.

import subprocess
import shlex

url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
sessionID = input('Please, enter jsessionid...\n')
sessionID = str(sessionID) # Cookies
cookies_from_function = " -H 'Cookie: rppValue=20; B_View=1; JSESSIONID=" + sessionID + "'"
tempstring =  '-L -O -C - ' + url + " -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:64.0) Gecko/20100101 Firefox/64.0' -H 'Accept: */*' --compressed -H 'Connection: keep-alive'" + cookies_from_function# Login To Browser, inspect element, go to network tab, reload, copy curl url for a pdf link. Extract headers with cookies and paste here.
# print(tempstring)
curl_cmd = "cd /Volumes/path/to/destination/ && curl " + tempstring# Original
subprocess.call(shlex.split(curl_cmd))
  • 1
    Have a look at [`subprocess.call()`](https://docs.python.org/3/library/subprocess.html#subprocess.call) again. Passing argument in general and for your problem have a look at `cwd` keyword argument in particular. Better yet, [`urllib`](https://docs.python.org/3/library/urllib.html#module-urllib) to not have to bother executing anything else in. – Ondrej K. Jan 20 '19 at 15:47
  • Possible duplicate of [Save file to specific folder with curl command](https://stackoverflow.com/questions/16362402/save-file-to-specific-folder-with-curl-command) – Marcin Orlowski Jan 20 '19 at 15:47
  • @MarcinOrlowski that link talks about bash only. This is python+bash. – lalitaalaalitah Jan 20 '19 at 15:50

2 Answers2

0

As suggested in a comment, you can use the cwd keyword argument to the subprocess functions to run in a different directory. Another simple option is to open a suitable file and pass it as stdout to the subprocess call.

Tangentially, you probably want to use check_call or the modern replacement run instead of the very basic call.

import subprocess
import os

url = 'https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf'
sessionID = input('Please, enter jsessionid...\n')
# No need, input aways returns a string in Python 3
# sessionID = str(sessionID) # Cookies
with open(os.path.join('/Volumes/path/to/destination', 'dummy.pdf')) as pdf:
    subprocess.check_call([
            'curl', '-L', '-C', '-', url,
            '-H', 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:64.0) Gecko/20100101 Firefox/64.0',
            '-H', 'Accept: */*', '--compressed',
            '-H', 'Connection: keep-alive',
            '-H', 'Cookie: rppValue=20; B_View=1; JSESSIONID={0}'.format(sessionID)],
          stdout=pdf)

This also does away with shlex, partly because you say in a comment you had to get rid of it, partly because it doesn't really offer any significant value over splitting a simple static command line into tokens manually once (though you have to understand how to do it, obviously).

If you want to keep the -O option,

subprocess.check_call([
    'curl', '-O', ...],
    cwd='/Volumes/path/to/destination')
tripleee
  • 175,061
  • 34
  • 275
  • 318
  • I'll try to check it. But, doesn't it assume that I've to provide the file name? If yes, any way to get away with that and use the file name provided by server? – lalitaalaalitah Jan 20 '19 at 17:42
  • Then probably the `cwd` trick alluded to earlier. I'll update with another version. – tripleee Jan 20 '19 at 18:12
  • See also https://stackoverflow.com/a/51950538/874188 for (much) more about subprocess and common antipatterns. (Tried to link before but put the wrong link, sorry.) – tripleee Jan 20 '19 at 18:18
  • For some reasons *subprocess.check_call(['curl', '-O', ...],cwd='/Volumes/path/to/destination')* downloads some html instead of pdf however providing cd command under check works without any problem. – lalitaalaalitah Jan 21 '19 at 00:05
-1

&& is a shell logical operator for running a command if the preceding one succeeds. So, you need to run it inside shell; use shell=True and pass it as string not as a list:

subprocess.call(curl_cmd, shell=True)

Running commands directly in shell, unless sanitized might have catastrophic impact as one can imagine.

As a side note, you should look at doing things directly in Python by using os and some web client e.g. requests.


Also, if you don't want to use the -o option of curl, you can use the shell redirection operator (>) to save the STDOUT of curl to some file:

curl -s ... >/out/file

-s silences curl so that we don't get progress status on STDERR.

heemayl
  • 39,294
  • 7
  • 70
  • 76
  • Thanks, that solves the problem. I've to use it in more complex task. I hope it works. For, requests library, my knowledge is limited. I wan't to have resume capabilities, so cUrl comes first in mind. For requests, I may have to get headers, then compare to file of same name in specific directory, etc. Please, share your insight of the procedure including requests library and some links to guide, if I'm not asking much. Thanks, again. – lalitaalaalitah Jan 20 '19 at 15:52
  • In my complex project, I had to remove the shlex part. This was causing problem even after adding shell argument. – lalitaalaalitah Jan 20 '19 at 16:03
  • @lalitaalaalitah Please see http://docs.python-requests.org/en/master/user/quickstart/ .`requests` is pretty simple as far as usable is concerned. Please go through the quickstart and let me know if you have any problem understanding something. Best of luck! – heemayl Jan 20 '19 at 16:06
  • I tried to use request, but redirects are exceeding limit even after exceeding. Tried to set a session too. – lalitaalaalitah Jan 20 '19 at 23:48
  • @lalitaalaalitah Okay. You should add the details to your question, and select the other as accepted as its more complete. I'm gonna delete my answer as the problem seems to be different now. – heemayl Jan 21 '19 at 04:44
  • No need to delete the answer. Your solution worked for my curl command. I was trying to use requests library as you suggested, the problem appears there. So, for present the cd && curl was solved by your answer. For requests library to do the same task, I'll search and create other question if needed. Thanks. – lalitaalaalitah Jan 21 '19 at 05:18