1

I am generating a byte array of size about 250MB. This is then being inserted into a table in a remote Microsoft SQL Server using standard JDBC code in a Java application.

My first setup - application running on one Amazon EC2 instance, Microsoft SQL Server running on a separate Amazon EC2 instance. In this case the INSERT statement works - pass.

My second setup - application running locally on my laptop, database hosted on the Amazon EC2 instance. Here the connection is made correctly but on the line
pstmt.executeQuery(1, byteArray);
the program keeps running without terminating even after say half an hour - fail.

To check that I am actually able to access and insert data into the remote DB from my local machine, I wrote a trivial application to insert one integer into the table - pass.

I suspect that since I do not have a really fast internet connection, my data upload fails. Is there anyway to speed up the data transfer from JDBC to the remote SQL Server database.

I have thought about increasing the packet size on the SQL Server - but I am not sure if this is the right way, since it works when using another Amazon EC2 instance to run the same application - only not when running from local instance of the application. Running Amazon EC2 instances for every developer on the team is an expensive proposition.

Community
  • 1
  • 1
Suchintya
  • 605
  • 5
  • 15
  • if you are looking at uploading large chunks of data remotely I would definitely look at chunking the upload and making it resumable ... expecting your session to hold uninterrupted for a long time over the messy internet connections is optimistic – Sam Saffron Mar 23 '12 at 05:13

2 Answers2

1

The first thought here . I would not recommend trying to push in so much data directly into SQL Server over a remote connection. A quick suggestion.

Is it possible that you upload the 250MB array ( i suspect its a file/image/video/some backup) as a file (normal file upload ..ajax or a normal POST) onto the remote Amazon Instance file system, and then have a daemon process running on the same machine which then picks up the file on the local disk and pushes it into the SQL instance.

Advantages.

  1. The file upload will be definitely faster.
  2. NO dependency on the JDBC adapter to keep alive and reliably upload the file.

I will try and come back with a better solution more suited to your direct approach .

Nilav Baran Ghosh
  • 1,349
  • 11
  • 18
  • You could take a look at alternate JDBC drivers for SQL Server . The defaukt Microsoft provided driver is definitely not teh best choice. Take a look here http://stackoverflow.com/questions/6943553/list-of-jdbc-drivers-for-sql-server-2008-comparison – Nilav Baran Ghosh Mar 21 '12 at 06:40
  • that's a good suggestion - will try it out. although with multiple developers accessing the database, it might be an overhead. – Suchintya Mar 22 '12 at 06:31
0

Your problem is network latency and not the database itself.

Think about how can you optimize transfer of the file to the EC2 instance. Sending a 250mb file in a sync manner is never a good way. This will always cause problems. Uploading the file and inserting it in the database are two different things. Do not mix them,

Thing you should do:

  1. compress the file as much as you can and then send it. AND
  2. Use ftp or http (same performance though) and upload the file to the server. Then do what ever you want to with that file on the server. (like inserting it to the database)

OR

  1. The other super-awesome way of transmitting large file over the network is: split the file in chunks (after compressing it), send them asynchronously (eg: twister in python or nio in java) and finally merge all the files on the server side (use checksum to validate the integrity of the file).
zengr
  • 38,346
  • 37
  • 130
  • 192