3

I am getting byte[] in request parameter in servlet which I am fetching in string and then again converting it into byte[] :

String encodingScheme = "UTF-8";
request.setCharacterEncoding(encodingScheme);
String requestStr = request.getParameter("inputstream");
byte[] rawRequestMsg = requestStr.getBytes(encodingScheme);

Now this byte[] I am trying to write to a .docx file as this byte[] which I am using is byte[] representation of a docx file only. Code for writing this to file is like :

String uploadedFileLocation = fileLocation;
FileOutputStream fileOuputStream = new FileOutputStream("path till .docx file");
fileOuputStream.write(byteArray);
fileOuputStream.close();

The problem is the .docx file being created is corrupt and unable to open, but when I change it to .doc then I can open it but instead of seeing the text content I see only the byte[] sequence there like below :

80, 75, 3, 4, 20, 0, 6, 0, 8, 0, 0, 0, 33, 0, -84, -122, 80, 87, -114, 1, 0, 0, -64, 5, 0, 0, 19, 0, 8, 2, 91, 67, 111, 110, 116, 101, 110, 116, 95, 84, 121, 112, 101, 115, 93, 46, 120, 109, 108, 32, -94, 4, 2, 40, -96, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 

Dont know how to write that correctly. Need help. Thanks, Samir

Actually the code below used to work which is of a REST webservice

@
POST@ Path("/binaryfileupload/{filename}")@ Consumes(MediaType.APPLICATION_OCTET_STREAM)
public Response upload(byte[] input, @PathParam("filename") String filename) {
  FileOutputStream fileOuputStream = new FileOutputStream(uploadedFileLocation);
  fileOuputStream.write(input);
  fileOuputStream.close();
}

Only change I made is from here this input which is byte[] I am sending to servlet and in servlet want to write file instead of writing in my webservice(which was working correctly).

samir
  • 339
  • 6
  • 19
  • May be the file is in DOC format and not DOCX format. They are very different. – Peter Lawrey Mar 09 '16 at 08:20
  • Check this link: http://stackoverflow.com/questions/25890776/java-bytearray-to-docx. Maybe the solution there can convert DOC with byte[] to DOCX proper. – Milind Gokhale Mar 09 '16 at 08:22
  • Looking at the first few bytes, this doesn't appear to be DOCX or DOC format http://www.garykessler.net/library/file_sigs.html – Peter Lawrey Mar 09 '16 at 08:22
  • It seems you are interpreting the input stream as UTF-8... but is that actually the encoding format of your input? For example, is it possible that you are receiving a GZIP-encoded byte stream as the real input, instead? – Michael Aaron Safyan Mar 09 '16 at 08:27
  • I have edited the post with some more information which rules out the possibility of incorrect doc format UTF-8 GZIP-encoded etc. – samir Mar 09 '16 at 09:09
  • The request already has the correct character encoding. I would be interested in seeing how this `inputstream` parameter is posted from the client. – user207421 Mar 09 '16 at 09:17
  • @EJP I am using the code like OutputStreamWriter writer = new OutputStreamWriter(connection.getOutputStream()); writer.write("inputstream="+Arrays.toString(input)); writer.close(); here input is byte[] – samir Mar 09 '16 at 09:22
  • So you are sending a comma-separated string of ASCII numbers, and you are writing that directly to the file without converting it back to a byte array. Just write `input` directly. There seems to be an epidemic of unnecessary `toString()` calls today. Try to taper off. The code in your comment contradicts the code in your edit. – user207421 Mar 09 '16 at 09:30
  • @EJP the code in comment was client code which is posting data and the code in edit is the previous one which was working, that I shared as a fact that byte[] writing to file was working. Also I tried to write input directly also without using Arrays.toString(input) but then it sends an String like [@B237.. which I believe is object. And as in servlet things coming in request I can only get in String, I had to send it in String from client, get that string in servlet, convert it back to byte[] in servlet and write that byte[] to file. Thats what I have done. – samir Mar 09 '16 at 09:55
  • You need to use an output stream to write byte arrays directly. If you use a `Writer` you are still going to get `toString()` behavior. Yo should be using POST, not PUT, so you can avoid the parameter format problem. – user207421 Mar 09 '16 at 11:06

2 Answers2

0

You are not writing a .doc file. You're just writing a simple text file and naming it as .doc or .docx.

For it to work as a word document file, you need to use a library such as Apache POI to do it for you.

For more info about Apache POI, you can see here: https://poi.apache.org/

You can also refer this link How can I create a simple docx file with Apache POI?

Community
  • 1
  • 1
Sachin Gupta
  • 7,805
  • 4
  • 30
  • 45
  • I think then problem lies in encoding format. I think for .doc and .docx file UTF-8 is not a correct encoding format. Please see this thread for more details : http://stackoverflow.com/questions/28172022/character-encoding-of-microsoft-word-doc-and-docx-files – Sachin Gupta Mar 09 '16 at 08:38
  • I dont think its related to encoding. Because when in my webservice method if I write the same sequence of byte[], its writing without any issues. but if the sequence of byte[] I post to servlet and there using the same code I write to file its not working as expected. – samir Mar 09 '16 at 09:18
  • but in web service method, you are not setting any character encoding. But in servlet you are explicitly setting encoding to UTF-8. The main difference encoding plays in your string to byte conversion. – Sachin Gupta Mar 09 '16 at 09:23
  • There is zero evidence here as to what the input is. – user207421 Mar 09 '16 at 09:34
  • @EJP the input here is a simple docx file which I am posting to my webservice as binary form POSTMaster tool and getting that in byte[] parameter of my webservice, now if I write that byte[] to file there itself, it is working. But, I want to send that byte[] to my servlet using POST request and do the writing stuff in my servlet. Writing byte[] to file code is exactly same in servlet that I used in my webservice. Also the byte[] data is also same. Please let me know in case any more info is needed. – samir Mar 09 '16 at 09:50
  • I think you can use request.getInputStream(); to get byte [] data in servlet. No need to do all string process – Sachin Gupta Mar 09 '16 at 10:03
  • @Sachin you are correct but I am doing all this servlet part in CQ5 servlet and there somehow we can't use getInputStream() or even getReader(), thats why had to go around with such complicated approach. – samir Mar 09 '16 at 10:43
0

I finally fixed it. I was making a small mistake. In the code

String requestStr = request.getParameter("inputstream");
byte[] rawRequestMsg = requestStr.getBytes(encodingScheme);

I am actually converting the String to byte even though its already in byte. thats why the value of requestStr is different than rawRequestMsg. Finally I used below code which simply takes the string into array and creates byte[] from it by individually separating each number :

String requestStr = request.getParameter("inputstream");
requestStr = requestStr.substring(1, requestStr.length() - 1);
String dataArray[] = requestStr.split(",");
byte[] rawRequestMsg = new byte[dataArray.length];
int count = 0;
for (String str: dataArray) {
  str = str.trim();
  rawRequestMsg[count++] = Byte.parseByte(str);
}

The trim function is used to remove whitespaces because its coming as 75, -84, 3 .... like this. And the substring is used to remove the [ from the begining and ] from the end. Thanks everyone for helping me. Hope this helps someone.

samir
  • 339
  • 6
  • 19