0

I am doing a CFHTTP post to a web service that is returning two parts (multipart), a XML and PDF. I am looking to get only the PDF. My cfhttp.filecontent is a java.io.ByteArrayOutputStream type. When I do a toString() I get the following

Part 1

Content-Type: application/xop+xml; type="text/xml"; charset=utf-8
Content-Transfer-Encoding: 8bit

Part 2

Content-Type: application/pdf
Content-Transfer-Encoding: binary

I get the response in cfhttp.fileContent and the data looks like the following

--MIME_Boundary
Content-ID: <aa82dfa.N51ec355b.3.15b86044531.59d6>
Content-Type: application/xop+xml; type="text/xml"; charset=utf-8
Content-Transfer-Encoding: 8bit
<?xml version="1.0" encoding="UTF-8"?>
<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/">....</soapenv:Envelope>
--MIME_Boundary
Content-Id: <2958beaa-dd72-4879-9d80-cc19876b2c2a@example.jaxws.sun.com>
Content-Type: application/pdf
Content-Transfer-Encoding: binary

%PDF-1.4
%ÈÁÄ×
<content removed>
25081
%%EOF

--MIME_Boundary--

I tried to remove all the data that's not related to the PDF but it's still not a valid binary file.

Any thoughts?

From the comments

When I do a cfdump on the fileContent I get the following:

Class Name: java.io.ByteArrayOutputStream 
Methods: 
    close() returns void 
    reset() returns void 
    size() returns int 
    toByteArray() returns byte[] 
    toString(java.lang.String) returns java.lang.String 
    toString() returns java.lang.String 
    toString(int) returns java.lang.String 
    write(byte[], int, int) returns void 
    write(int) returns void 
    writeTo(java.io.OutputStream) returns void

When I invoke toByteArray() I get binary data. I then save the data to a file and I see both XML and PDF parts of the file.

cma0651
  • 31
  • 6
  • 1
    Instead of converting the `cfhttp.filecontent` variable using `toString()` try dumping it. This will show you the structure that ColdFusion creates for you holding each piece of information. `` or in script `writeDump(cfhttp.filecontent);` . Copy that output to your question above. Once we know the field containing the actual binary data, you will need to base64 decode back to binary before presenting to the user. [binaryDecode](http://cfdocs.org/binarydecode). Something like `` – Miguel-F Apr 24 '17 at 19:27
  • Thanks for your response. When I do a cfdump on the fileContent I get the following, `Class Name: java.io.ByteArrayOutputStream Methods: close() returns void reset() returns void size() returns int toByteArray() returns byte[] toString(java.lang.String) returns java.lang.String toString() returns java.lang.String toString(int) returns java.lang.String write(byte[], int, int) returns void write(int) returns void writeTo(java.io.OutputStream) returns void` – cma0651 Apr 24 '17 at 22:16
  • When I invoke `toByteArray()` I get a binary data. I then save the data to a file and I see both xml and pdf parts of the file. If I take the results and perform `binaryDecode` I get `ByteArray objects cannot be converted to strings.`. `BinaryEncode` works. – cma0651 Apr 24 '17 at 22:19
  • Can you give us the URL to post to, so we may try? – Jules Apr 25 '17 at 00:05
  • If it is not public for some reason, is that the complete fileContent above? (I realize the content is trimmed for brevity, but I would have expected additional headers...) – Leigh Apr 25 '17 at 02:45
  • That dump you provided does not seem to be from a `cfhttp` call. Looks more like from a `cfinvoke` call??? Can you provide the code that you are using to call this service? – Miguel-F Apr 25 '17 at 13:46
  • Jules, this is code internal to my company. Miguel-F, the is dump from the `cfhttp.filecontent` I normally get regular text results but I suspect that since this is a multiPart response I got a java object. Leigh, since the code has PHI information, I can't provide all the details but I can assure that the only content that I removed was some xml in part 1 and the PDF binary in part 2. – cma0651 Apr 26 '17 at 11:19
  • @Chris - A multipart response has a very specific format. The tools typically used to parse it require specific information be present, like a header indicating the content to follow *is* multipart and the boundary marker used. I am trying to figure out if cfhttp is actually removing that information or you just discarded it as irrelevant. There are other ways to consume web services that may or may not be simpler... All depends on your code. It would *really* help to see a) your cfhttp call and b) a full dump of the cfhttp response (including headers) - redact anything confidential of course. – Leigh Apr 26 '17 at 13:15
  • Did some reading. Assuming you can identify the boundary marker value dynamically. Try this example. https://gist.github.com/anonymous/5610b421abad1733c9a359d6bff8a068 . Note, the response data above is not valid. There should be a new line in between `Content-Transfer-Encoding: 8bit` and the xml. (I am assuming it was trimmed out accidentally.) – Leigh Apr 27 '17 at 15:50
  • @Leigh, thanks, this one part of the puzzle that led me to my solution! I am posting my answer now. – cma0651 Apr 27 '17 at 23:55
  • (Edit) @Chris - I see you took a different path. If you have time, could you try the suggested example? It should work fine, but I do not have an MTOM web service to confirm. – Leigh Apr 28 '17 at 00:09
  • 1
    @Leigh - I will do this tomorrow. Your suggestion came after I got the solution I posted. – cma0651 Apr 28 '17 at 00:12
  • @Chris - That would be great. I'm very curious about handling that type of content, ie mtom (BTW, did not see your answer yet when I posted the last comment, and did not realize you went a *totally* different direction :-) – Leigh Apr 28 '17 at 00:22
  • @Leigh - Your link to gitgub did fix it, with 1 small change! I will update my answer in a few minutes. – cma0651 Apr 28 '17 at 11:14

1 Answers1

1

The workaround required two changes: a change to set the accepted encoding value to gzip,deflate and to work with binary data using java.

<cfhttpparam type="HEADER" name="Accept-Encoding" value="gzip,deflate">

Second I needed to manipulate the response using binary methods.

binResponse = result.fileContent.toByteArray();

Next I used a utility from Ben Nadel, Binary.cfc, that has all the binary manipulation I needed. I used the method binarySlice() to extract the start and end part of the binary. The sliced data contains the binary in the exact format that I needed. It was not base64 or any another type, it was binary.

sliced = binNadel.binarySlice( binResponse, <int posistion to start slice>, <int length of binary>));

This solution works, but it's ripe with potential issues, for example the order of the response could switch, the boundary name could change, etc. So this will require a lot of error handling to ensure smooth sailing.

Update:

Next I looked into Leigh's example to see if I could simplify my code. They suggested using Java's MimeMultipart class which supports parsing an MTOM multipart response. Here is the final working code:

<cfscript>
    // Modify path as needed
    saveToDirec = "c:\temp\";

    // Hard coded "boundary" value for DEMO purposes. It MUST match actual value used in cfhttp response
    // Best to use cfhttp.responseHeader.content-Type so [if] the service changes your code won't break.
    contentType = "multipart/related; boundary=MIME_Boundary;";  

    // Load and parse ByteArrayOutputStream returned by CFHTTP
    dataSource = createObject("java", "javax.mail.util.ByteArrayDataSource").init(m_strSoapResponse.fileContent.toByteArray(), javaCast( "string", contentType));
    mimeParts = createObject("java", "javax.mail.internet.MimeMultipart").init(dataSource);

    for (i = 0; i < mimeParts.getCount(); i++) {
        writeOutput("<br>Processing part["& i &"]");
        bp = mimeParts.getBodyPart( javacast("int", i));

        // If this part is a PDF, save it to a file.
        if (!isNull(bp) && bp.isMimeType("application/pdf")) {
            outputFile = createObject("java", "java.io.File").init(saveToDirec &"demo_savedfile_"& i &".pdf");
            bp.saveFile(outputFile);
            writeOutput("<br>Saved: "& outputFile.getAbsolutePath());
        }
    }
</cfscript>

Thanks all for your input!

Community
  • 1
  • 1
cma0651
  • 31
  • 6
  • Interesting. I'm curious about mtom responses, so couple questions :) 1. What type of object do you get back after adding gzip,deflate - a binary array? If yes, does it have the same multipart content as before? 2. How are you determining the start/end position? If it could be dynamic (like with MimeMultipart) this would be a nice solution.3. Regarding the boundary marker, did you dump the cfhttp result and check the other headers? Might be included with the "content-type".. – Leigh Apr 28 '17 at 00:16
  • Glad it worked out. FYI, S.O. threads are archived for the benefit of everyone, so I made some edits to include more information for future readers :) (Also added a link to where I read about the idea, to give the original author credit). Feel free to make changes. – Leigh Apr 28 '17 at 13:27