Additional "2000" String ([32 30 30 30] bytes) at the beginning of a file

Question

I have a really strange issue and I cannot find the solution.

I have a simple test servlet that stream a small pdf file in the response:

public class TestPdf extends HttpServlet implements Servlet {

    private static final long serialVersionUID = 1L;

    public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {

        File file = new File(getServletContext().getRealPath("/lorem.pdf"));

        response.setContentType("application/pdf");

        ServletOutputStream out = response.getOutputStream();

        InputStream in = new FileInputStream(file);

        byte[] bytes = new byte[10000];

        int count = -1;

        while ((count = in.read(bytes)) != -1) {
            out.write(bytes, 0, count);
        }

        in.close();

        out.flush();
        out.close();

    }

}

If I call the servlet url with a browser, curl, wget, everything is fine, but when I call it with a simple TCL script like this:

#!/usr/bin/tclsh8.5

package require http;

set testUrl "http://localhost:8080/test/pdf"
set httpResponse [http::geturl "$testUrl" -channel stdout]

the file has a "2000" string at the beginning that corrupt the pdf.

The issue does not seems related to Tomcat or JDK version, since I am able to reproduce it on my development environment (Ubuntu 16.04) with both JDK 1.5.0_22 Tomcat 5.5.36 and JDK 1.8.0_74 and Tomcat 8.5.15.

Never used TCL, but isn't that just the http code 200 plus an extra 0 and then your file? — jhamon, Aug 10 '18 at 07:49
Thank you for the comment. i just tried to change the HTTP response code to 201, but the "2000" is still the same. — Luca Dominici, Aug 10 '18 at 07:57
Have you tried accessing a "known good" URL (for example https://stackoverflow.com/robots.txt) with your TCL script? That way you can figure out if the issue comes from the Java code or the TCL. — Joachim Sauer, Aug 10 '18 at 07:59
Ok, i tried with a sample PDF (http://unec.edu.az/application/uploads/2014/12/pdf-sample.pdf) and the downloaded file is correct, so the issue is Java/Tomcat related... — Luca Dominici, Aug 10 '18 at 08:03
Is it possible that the servlet is using chunked transfer encoding and this isn't supported by your script? In chunked transfer encoding, the data is sent in chunks that are prefixed by the length of the chunk (encoded as hex in ASCII, so 2000 means the chunk is 8192 bytes) followed by a CRLF sequence and then 8192 bytes of data, then a CRLF, followed by the next chunk, etc. — Mark Rotteveel, Aug 10 '18 at 08:08
@Paul Karam: No, i just made a simple test servlet with a pdf file but on the original code the issue is present with every file i try to download. — Luca Dominici, Aug 10 '18 at 08:09
Thank you @Mark, you are pointing me in the right direction. I tried to add a Content-Length header to the response and the issue disappeared. Now i just have to understand what caused it in the first place, since the issue appeared after an upgrade of the JDK and Tomcat version in a test environment, leaving both the Servlet and TCL code of our application unchanged. — Luca Dominici, Aug 10 '18 at 08:22
Good to hear. Setting an explicit content length would indeed disable chunked transfer encoding; not sure there are a lot of other options (except changing your TCL script to use/support chunked transfer encoding). Possible cause is that Tomcat changed something in the way it buffers responses, or maybe your upgrade removed/overwrote a config option on this (eg maybe it used to buffer more bytes to then calculate the content length, and now the buffer is smaller so it switches to chunked earlier). — Mark Rotteveel, Aug 10 '18 at 08:24
Thank you again, i should be able to change the application code and send the content length header to solve the issue. If you change the comment into an answer i will gladly accept it :) — Luca Dominici, Aug 10 '18 at 08:49

mrcalvin · Answer 1 · 2018-08-10T09:21:16.337

3

What you see is the start of a chunk, the number of octets contained by the chunk, as pointed out by others. To handle this from the Tcl client side (and not by turning off chunked transfer-encoding from the Tomcat POV), you need to omit the -channel option to http::geturl:

package require http;

set testUrl "http://localhost:8080/test/pdf"
set httpResponse [http::geturl "$testUrl"]
fconfigure stdout -translation binary; # turn off auto-encoding on the way out
puts -nonewline stdout [http::data $httpResponse]

This should properly transmogrify the chunked content into one piece. Background is that handling of chunked content did not work with the -channel option, when I last checked.

edited Aug 10 '18 at 09:21

answered Aug 10 '18 at 08:55

mrcalvin

3,291
12
18

1

The difference to using `-channel` is that your response data will end up as an extra in memoryland, as a Tcl value. You can also omit `stdout` to `puts` (its default), this is just to make the connection to your script more visible. – mrcalvin Aug 10 '18 at 09:00
1

Now that I refreshed my memory: Starting from Tcl 8.6, http 2.8.0, `-channel` handles chunked transfer-encoding. You should obtain a fresh Tcl installation. – mrcalvin Aug 10 '18 at 09:07
1

This came with 8.6.0, released in 2009. So, you should REALLY update! – mrcalvin Aug 10 '18 at 09:33

score 0 · Answer 2 · answered Aug 10 '18 at 08:03

I have never used TCL but this is the way how you can wtite a general file download servlet:

public class DownloadServlet extends HttpServlet {
    private final int BUFFER_SIZE = 10000;

    @Override
    protected void doGet(HttpServletRequest req, HttpServletResponse resp) 
      throws ServletException, IOException {

        String filename = "test.pdf";
        String pathToFile = "..../" + filename;

        resp.setContentType("application/pdf");
        resp.setHeader("Content-disposition", "attachment; filename=" + filename);

        try(InputStream in = req.getServletContext().getResourceAsStream(pathToFile);
          OutputStream out = resp.getOutputStream()) {

            byte[] buffer = new byte[BUFFER_SIZE];
            int numBytesRead;

            while ((numBytesRead = in.read(buffer)) > 0) {
                out.write(buffer, 0, numBytesRead);
            }
        }
    }
}

Hope that this piece of code helps you.

Additional "2000" String ([32 30 30 30] bytes) at the beginning of a file

2 Answers2