Loss of bytes and corruption of data while inserting image, xls files into blob using byte array

Question

I wanted to create a generic upload code,which could insert all data ,be it text,images,video files etc into a blob field and retrieve data from the same. I had assumed that converting the file into bytes and then uploading it, and using similar approach to download it would work. What beats me is that though I am able to successfully download files in any format using the bytes approach, uploading files by converting into bytes work only with pdf and text-related documents. While uploading image files, or even xls files, I observed that the byte size of blob field is lesser(update: mismatched) than that of original file. These files were found to be unreadable after downloading.

(I have come across a procedure in https://community.oracle.com/thread/1128742?tstart=0 which would help me in achieving what I want. But this would mean considerable rewriting in my current code. I hope there is a way out in my current code itself.)

My 'insert to blob' code as of now is :

Table structure :

desc project_storage
Name       Null Type          
---------- ---- ------------- 
PROJECT_ID      NUMBER(38)    
FILE_NAME       VARCHAR2(200) 
DOCUMENTS       BLOB          
ALIAS           VARCHAR2(50)  
FILE_TYPE       VARCHAR2(200)

Code:

@RequestMapping(value="/insertProject.htm", method = { RequestMethod.POST}, headers="Accept=*/*")
    public String insert(@RequestParam Map<String,Object> parameters,@RequestParam CommonsMultipartFile file,Model model,HttpServletRequest req){
        try
        {
            log.info("Datasource || attempt to insert into project storage -Start()");
            if(file!=null)
            {
                String file_type=file.getContentType();
                String file_name=file.getOriginalFlename();
                String file_storage=file.getStorageDescription();
                long file_size=file.getSize();
                System.out.println("File type is : "+file_type);
                System.out.println("File name is :"+file_name);
                System.out.println("File size is"+file_size);
                System.out.println("Storage Description :"+file_storage);
            ByteArrayOutputStream barr = new ByteArrayOutputStream();
                ObjectOutputStream objOstream = new ObjectOutputStream(barr);
                objOstream.writeObject(file);
                objOstream.flush();
                objOstream.close();
                byte[] bArray = barr.toByteArray(); 

            //  byte [] bArray=file.getBytes();
            //  InputStream inputStream = new ByteArrayInputStream(bArray);
                //inputStream.read();
                Object objArray[]=new Object[]{file_name,bArray,parameters.get("alias"),file_type};   //bArray is our target.
           int result=dbUtil.saveData("Insert into Project_Storage(Project_Id,File_Name,Documents,Alias,File_Type) values(to_number(?),?,?,?,?)", objArray);
            model.addAttribute(result);

            }
        }   
        catch(Exception e){
            System.out.println("Exception while inserting the documents"+e.toString());
            e.printStackTrace();
        }

        log.info("Datasource || attempt to insert new project -end()");
        return "admin/Result";
        }

Where am I going wrong? What exactly is being populated into the blob field when the files are xls,image or files?

EDIT: Tried uploading a dummy excel file which is as follows: Game.xls file

The console output is as follows :
    [6/29/15 20:28:41:803 IST] 0000004d SystemOut     O File type is : application/octet-stream
    [6/29/15 20:28:41:803 IST] 0000004d SystemOut     O File name is :Game.xls
    [6/29/15 20:28:41:804 IST] 0000004d SystemOut     O File size is: 6144  //this is original size of file.
    [6/29/15 20:28:41:804 IST] 0000004d SystemOut     O Storage Description :in memory
    [6/29/15 20:28:41:805 IST] 0000004d SystemOut     O  DBUtil ||  saveData || Query : Insert into Project_Storage(Project_Id,File_Name,Documents,Alias,File_Type) values(to_number(?),?,?,?,?)|| Object : [Ljava.lang.Object;@1eecbb7
    [6/29/15 20:28:42:020 IST] 0000004d servlet       I

The downloaded file is of size 6.7 KB, and contains the data as follows: enter image description here

This shows that this is an issue of data corruption. I had previously assumed only partial entry of data is occuring. EDIT :- The file is being read through a form.

<span class="btn btn-default btn-file" id="btn_upload">
    Browse <input type="file" name="file"></span>

I'm not familiar with java, but i had a problem with blob once. Check if any of this answers could help you: http://stackoverflow.com/questions/31030744/how-to-get-bmp-image-from-from-binary-blob-data-with-php — Clyff, Jun 29 '15 at 14:11
@Clyff, I don't think it is a size/length limitation.. though I am new to programming as well. However, learnt a new thing today from you :-) — I_am_Batman, Jun 29 '15 at 14:29
Why save in Blob, or for that matter even in DB, you can use a bytea in DB and save here also as a byte array. The advantage is less processing. You can lazy load the bytea variable by specifying atBasic annotation with lazy loading on it, so the document is not always fetched. — We are Borg, Jun 29 '15 at 14:38

score 0 · Answer 1 · answered Jun 29 '15 at 16:07

0

You are storing a java object: file. The retrieval seems erroneous, as you see some of the serialized object in the Excel file.

As the upload data (file) are of temporary character (the uploaded file should get processed and removed), and I do not think that file will contain the entire content (not sure though!), best store the bytes of the content explicitly, separate from the metadata like file size and type. That would allow to immediately stream the file content to the user. The same holds for the upload: instead of a ByteArrayOutputStream providing an InputStream for storing the blob would diminish server load.

Having the metadata separate also allows for a nice "directory overview."

answered Jun 29 '15 at 16:07

Joop Eggen

107,315
7
83
138

Just clueless how pdf files of considerable size get inserted seamlessly, if file could not contain the entire content. Would try implementing your suggestions to improve performance. – I_am_Batman Jun 30 '15 at 06:05
Sorry, I think you should show the reading code. Still have the hope that it is something trivial like doing block reads wrong or such. – Joop Eggen Jun 30 '15 at 08:12
Maybe I missed something that I should have posted, the file is being read through a form. Browse . Anything to do with this? – I_am_Batman Jun 30 '15 at 08:30
Again, the file size is same as that of the intended upload file, as we could see in the 'console output', so I had assumed that this part is functional. That is why omitted the same. – I_am_Batman Jun 30 '15 at 08:32
I'll remove my answer in a moment, so you'll get more readers. – Joop Eggen Jun 30 '15 at 09:29
Is that necessary? Let us see if someone else could come up with a better working solution. I would soon start re-coding to call using the stored procedure, though I would very much like to find out where I went wrong here. I am sure you would like to, too. Thanks for all the efforts and time. Cheers! :) Let the answer be here. It might give someone else a headstart to reconstruct using your approach. – I_am_Batman Jun 30 '15 at 09:50
Either storing or retrieval is wrong. When the sizes are the same, you'll need some clues. Small test data and a hex dump might help. Also the database type should be BLOB, not CLOB. The serialized data can be dumped and compared with the DB content. – Joop Eggen Jun 30 '15 at 10:48

Titi Wangsa bin Damhore · Answer 2 · 2015-06-30T23:55:13.643

0

I see the problem. this is how it works. raw file content = 6144 bytes this contains the proper header for excel/pdf/jpeg.

lets's call this "A". "A" usually has two parts. ["Ah" + "Ab"] = ["A"] "Ah" is the header, it tells excel that this is an excel document, it tells a pdf reader that this is a pdf document. Usually a few bytes. maybe 3 or 4, maybe slightly more.

"Ab" is the body.

Size("A") = 6144 bytes Size("Ah" + "Ab") = 6144 bytes

ok. done with the 6114 bytes.

next is the 6.7kb you stored as a blob. this contains more data.

Size("Blob") = Size(CommonsMultipartFile file) = 6.7kb

this contains [file size, mime type, ("Ah" + "Ab"), other data]

so you have 2 options.

option a:

Change storage code and no change to your retrieval codes.

use byte[] bArray = barr.toByteArray();

and store ("Ah" + "Ab")

or

option b: Change retrieval code and no change to your storage codes. When you get the data from the database, convert it to CommonsMultipartFile file and from that use file.getBytes and write the contents of the bytes to the servlet output stream.

edited Jun 30 '15 at 23:55

answered Jun 30 '15 at 23:40

Titi Wangsa bin Damhore

7,101
4
31
36

Hi, it seems I am able to move the file obtained right now into another folder flawlessly, as follows : 'try{ File file2 = new File("C:\\trial\\"+file_name); file2.getParentFile().mkdir(); file2.createNewFile(); file.transferTo(file2); } catch(SecurityException e){ System.out.println("File cannot be created due to security exception"); e.printStackTrace(); }' – I_am_Batman Jul 02 '15 at 17:11
1

The code used to download the file is in my other question : http://stackoverflow.com/questions/31054428/extract-blob-value-and-pass-to-controller-in-spring-mvc-without-using-hibernate It is working flawlessly as of now, so not planning to change this code. So this leaves me with option a, which I think I am implementing here right now.Thanks for the insights :-) – I_am_Batman Jul 02 '15 at 17:13

Loss of bytes and corruption of data while inserting image, xls files into blob using byte array

2 Answers2