0

I am trying to query Adobe PDF services API to generate (export) DOCX from PDF documents.

I just wrote a python code to generate a Bearer Token in order to be identified from Adobe PDF services (see the question here: https://stackoverflow.com/questions/68351955/tunning-a-post-request-to-reach-adobe-pdf-services-using-python-and-a-rest-api). Then I wrote the following piece of code, where I tried to follow the instruction in this page concerning the EXPORT option of Adobe PDF services (here: https://documentcloud.adobe.com/document-services/index.html#post-exportPDF).

Here is the piece of code :

import requests
import json
from requests.structures import CaseInsensitiveDict
N/B: I didn't write the part of the code generating the Token and enabling identification by the server
>> This part is a POST request to upload my PDF file via form parameters
URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"

headers = CaseInsensitiveDict()
headers["x-api-key"] = "client_id"
headers["Authorization"] = "Bearer MYREALLYLONGTOKENIGOT"
headers["Content-Type"] = "application/json"

myfile = {"file":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}

j="""
{
  "cpf:engine": {
    "repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
  },
  "cpf:inputs": {
    "params": {
      "cpf:inline": {
        "targetFormat": "docx"
      }
    },
    "documentIn": {
      "dc:format": "application/pdf",
      "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/trs_pdf_file_copy.pdf"
    }
  },
  "cpf:outputs": {
    "documentOut": {
      "dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
    }
  }
}"""

resp = requests.post(url=URL, headers=headers, json=json.dumps(j), files=myfile)
   

print(resp.text)
print(resp.status_code)

The status of the code is 400 I am tho well authentified by the server But I get the following as a result of print(resp.text) :

{"requestId":"the_request_id","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"}

I think that I have problems understanding the "form parameters" from the Adobe Guide concerning POST method for the EXPORT job of the API (https://documentcloud.adobe.com/document-services/index.html).

Would you have any ideas for improvement. thank you !

Abdel
  • 139
  • 17
  • Two form parameters should be declared as per the Adobe guide (contentAnalyserRequests called "j" in my code and that is a json, and that I didn't introduce in my code because I don't know how). Could you please help? thanks ! – Abdel Jul 13 '21 at 15:45
  • So I am _very_ new to Python, but are you sure you are creating a multipart request correctly? The error seems to imply you are not. – Raymond Camden Jul 13 '21 at 19:43
  • For example, maybe this: https://stackoverflow.com/a/15785071/52160 – Raymond Camden Jul 13 '21 at 19:44

2 Answers2

3

Make you variable j as a python dict first then create a JSON string from it. What's also not super clear from Adobe's documentation is the value for documentIn.cpf:location needs to be the same as the key used for you file. I've corrected this to InputFile0 in your script. Also guessing you want to save your file so I've added that too.

import requests
import json
import time

URL = "https://cpf-ue1.adobe.io/ops/:create?respondWith=%257B%2522reltype%2522%253A%2520%2522http%253A%252F%252Fns.adobe.com%252Frel%252Fprimary%2522%257D"

headers = {
    'Authorization': f'Bearer {token}',
    'Accept': 'application/json, text/plain, */*',
    'x-api-key': client_id,
    'Prefer': "respond-async,wait=0",
}

myfile = {"InputFile0":open("absolute_path_to_the_pdf_file/input.pdf", "rb")}

j={
  "cpf:engine": {
    "repo:assetId": "urn:aaid:cpf:Service-26c7fda2890b44ad9a82714682e35888"
  },
  "cpf:inputs": {
    "params": {
      "cpf:inline": {
        "targetFormat": "docx"
      }
    },
    "documentIn": {
      "dc:format": "application/pdf",
      "cpf:location": "InputFile0"
    }
  },
  "cpf:outputs": {
    "documentOut": {
      "dc:format": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "cpf:location": "C:/Users/a-bensghir/Downloads/P_D_F/output.docx"
    }
  }
}

body = {"contentAnalyzerRequests": json.dumps(j)}

resp = requests.post(url=URL, headers=headers, data=body, files=myfile)
   

print(resp.text)
print(resp.status_code)

poll = True
while poll:
    new_request = requests.get(resp.headers['location'], headers=headers)
    if new_request.status_code == 200:
        open('test.docx', 'wb').write(new_request.content)
        poll = False
    else:
        time.sleep(5)
PGHE
  • 1,585
  • 11
  • 20
  • Thank you! I took in account these suggestions. Then I got the following message for Print(resp.text) : {"requestId":"aSeriesOfLetters","type":"Bad Request","title":"Not a multipart request. Aborting.","status":400,"report":"{\"error_code\":\"INVALID_MULTIPART_REQUEST\"}"} – Abdel Jul 13 '21 at 21:38
  • Make sure you're using the `data` attribute in the `requests.post` and not `json`. – PGHE Jul 13 '21 at 21:45
  • 1
    Sorry bout that, you're right there's a second request to get the doc. – PGHE Jul 13 '21 at 22:01
  • Now it's working well! thanks! I still have a problem.. I don't know why the docx file (its well created by the way) doesn't open, telling via popup that the content is not readable. maybe it's due to the `` 'wb' `` parsing methos – Abdel Jul 13 '21 at 22:15
  • Try with a polling mechanism. You might be getting caught out with requesting the doc before it's ready. See edit. – PGHE Jul 13 '21 at 22:59
  • Make another question with the changed code. – PGHE Jul 15 '21 at 21:05
1

I don't know why the docx file (its well created by the way) doesn't open, telling via popup that the content is not readable. maybe it's due to the 'wb' parsing methos

I had the same issue. Typecasting to 'bytes' the request contents solved it.

poll = True
    while poll:
        new_request = requests.get(resp.headers['location'], headers=headers)
        if new_request.status_code == 200:
            with open('test.docx', 'wb') as f:
                f.write(bytes(new_request.content))
            poll = False
        else:
            time.sleep(5)
Max M
  • 11
  • 1