I need to extract a PDF file stored in a NTEXT column (Collation latin1_general_ci_as) of a SQL Server database table (SQL Server 2014).
The problem is, no matter how I read this column, I always get a corrupted PDF file. My guess is that I'm not reading it with the correct charset, since the pdf content seems ok when reading it in SQL Server Studio or DBVisualizer.
I'm using the JTDS driver and the standard JDBC API to access the database. To read the column, I've tried the following:
PreparedStatement stmt = conn.prepareStatement("select my_ntext_column from my_table where id = 1");
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
byte[] data = IOUtils.toByArray(rs.getAsciiStream());
FileUtils.writeByteArrayToFile(new File("C:\\test.pdf"), data);
}
With this code, almost the entire file content is replaced by question marks.
PreparedStatement stmt = conn.prepareStatement("select my_ntext_column from my_table where id = 1");
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
byte[] data = IOUtils.toByArray(new InputStreamReader(rs.getBinaryStream(), "WINDOWS-CP1252"));
FileUtils.writeByteArrayToFile(new File("C:\\test.pdf"), data);
}
With the above code, a few chars are replaced by question marks, although, adobe reader and other pdf visualizers can't open it. I've also tried other charsets (UTF8, UTF16, ISO-8859-1), but no luck.
PreparedStatement stmt = conn.prepareStatement("select convert(varbinary(max),convert(nvarchar(max),my_ntext_column)) from my_table where id = 1");
ResultSet rs = stmt.executeQuery();
if (rs.next()) {
byte[] data = IOUtils.toByArray(rs.getBinaryStream());
FileUtils.writeByteArrayToFile(new File("C:\\test.pdf"), data);
}
Reading the content as a binary also doesn't help. The result file can't be read by any pdf visualizer.
Edit:
These are the first 8 bytes of the NTEXT column:
%PDF-1.3