3

Is there a substantial overhead of using HTTP over plain sockets (Java on Android) to send a large (50-200 MB) file [file is on the SD card] from an Android device to a Linux server over a Wi-Fi network.

In my current prototype I'm using CherryPy-3.2.0 to implement my HTTP server. I'm running Android 2.3.3 on a Nexus one as my client.

Currently it's taking around ~100 seconds** (on slower network 18 Mbps*) and ~50 seconds (on a faster 54 Mbps*) Wi-Fi network to upload a 50 MB binary file.

NOTE:
*I'm using WifiInfo.getLinkSpeed() to measure the network link speed

** This is the time difference before and after the HTTPClient.execute(postRequest)

Any other ideas regarding other expensive operations that may have a substantial part in the total time apart from the network and how to reduce this time would be appreciated.

Thanks.

EDIT - HTTP post code on Android

private void doHttpPost(String fileName) throws Exception{

    HttpParams httpParameters = new BasicHttpParams();

    // Set the timeout in milliseconds until a connection is established.
    int timeoutConnection = 9000000;
    HttpConnectionParams.setConnectionTimeout(httpParameters, timeoutConnection);
    // Set the default socket timeout (SO_TIMEOUT) 
    // in milliseconds which is the timeout for waiting for data.
    int timeoutSocket = 9000000;
    HttpConnectionParams.setSoTimeout(httpParameters, timeoutSocket);

    HttpClient client = new DefaultHttpClient(httpParameters);

    client.getParams().setParameter(ClientPNames.COOKIE_POLICY, CookiePolicy.RFC_2109);

    HttpPost postRequest = new HttpPost();
    postRequest.setURI(new URI("http://192.168.1.107:9999/upload/"));

    MultipartEntity multiPartEntity = new MultipartEntity();
    multiPartEntity.addPart("myFile", new FileBody(new File(fileName)));
    postRequest.setEntity(multiPartEntity);

    long before = TrafficStats.getTotalTxBytes();
    long start = System.currentTimeMillis();
    HttpResponse response = client.execute(postRequest);
    long end = System.currentTimeMillis(); 
    long after = TrafficStats.getTotalTxBytes();

    Log.d(LOG_TAG, "HTTP Post Execution took " + (end - start) + " ms.");


    if( before != TrafficStats.UNSUPPORTED && after != TrafficStats.UNSUPPORTED)
        Log.d(LOG_TAG, (after-before) + " bytes transmitted to the server");
    else
        Log.d(LOG_TAG, "This device doesnot support Network Traffic Stats");

    HttpEntity responseEntity = response.getEntity();


    if (responseEntity != null) {
        responseEntity.consumeContent();
        Log.d(LOG_TAG, "HTTP Post Response " + response.getEntity().getContent().toString() );
    }

    client.getConnectionManager().shutdown(); 

}

EDIT 2: Based on the results reported by this tool it looks like the SD card read speed is not an issue. So it may either be the HttpClient library or something else. enter image description here enter image description here

Soumya Simanta
  • 11,523
  • 24
  • 106
  • 161
  • How are you sending the file over HTTP? If you are encoding it to BASE64, that will take a good chunk of time as well. – jakebasile Apr 05 '11 at 04:22
  • @Jake I added the code for sending the HTTP request. – Soumya Simanta Apr 05 '11 at 04:40
  • Looks like you _are_ using multipart encoding, that's good. Care to share what type of data you are sending (text, image, video, etc.)? – skabbes Apr 05 '11 at 06:53
  • @skabbes - It's part of a virtual machine (.vdi) binary and it is already compressed and encrypted. – Soumya Simanta Apr 05 '11 at 14:32
  • Maybe you could upload the file _from_ the server to itself. This would give you the bottleneck of your server (independent of bandwidth). – skabbes Apr 05 '11 at 16:19
  • @skabbes - great idea. I'm going to try that right now. I did some rough measurements uploading a the same file using two different browsers (Safari and Chrome on OS X) on the same WiFi network (and almost the same physical location as the Android phone). Both browser uploads took about ~40-45 seconds where as the phone uploads are still taking around ~90 seconds. – Soumya Simanta Apr 05 '11 at 16:31
  • @skabbes I uploaded the same file from the local machine to itself and it takes about ~8 seconds. I tried doing it over a wired network and it took around ~10 seconds. – Soumya Simanta Apr 05 '11 at 18:30
  • Well, looks like the server isn't the bottleneck (not surprising). You could try to get a better estimate of the SD card speed by downloading the file off of it (through adb perhaps). If that's faster as well, looks like you wireless network is to blame. – skabbes Apr 05 '11 at 22:05
  • @skabbes - I tried using adb pull which gave me the following result. 1576 KB/s (48316794 bytes in 29.938s). I also wrote a standalone Java program using Apache HTTP client jars (the same one that Android uses) and posted the file to a server running on localhost. The entire post operation took 6204 ms. Another thing I tried was to move my phone closer to the WiFi access point. By doing so I saw an improvement of about 30%. So I'm starting to believe the bottle neck is a combination of file read and wireless transmission on Android. – Soumya Simanta Apr 06 '11 at 03:15

3 Answers3

9

Overhead on HTTP connection comes from the headers that it sends along with your data (which is basically a constant). So the more data you send, the less the headers 'hurt you'. However, the much more important aspect to consider is encoding.

For example, if you are sending non-ASCII data, paired with a mime type of application/x-www-form-urlencoded you run the risk of exploding the input size because non-ASCII characters must be escaped.

From the spec:

The content type "application/x-www-form-urlencoded" is inefficient for sending large quantities of binary data or text containing non-ASCII characters. The content type "multipart/form-data" should be used for submitting forms that contain files, non-ASCII data, and binary data.

The alternative is multipart/form-data which efficient for binary data. So, make sure your application is using this MIME type (you can even probably check this on your server logs).

Another method which can considerably reduce your upload time is compression. If you are uploading data which isn't already compressed (most image and video formats are already compressed) try adding gzip compression to your uploads. Another post shows the details of setting this up in android.

If your data is of a specific format (say an image), you can look into lossless compression algorithms for your type of data (png for images, FLAC for audio, etc.). Compression always comes at the price of CPU (battery), so keep that in mind.

Remember:

Don't optimize something until you know its the bottleneck. Maybe your server's connection is slow, maybe you can't read from the android file system fast enough to push your data to the network. Run some tests and see what works.

If it were me, I would not implement the straight tcp approach. Just my 2 cents, good luck!

Community
  • 1
  • 1
skabbes
  • 890
  • 7
  • 17
  • thanks. My file is already compressed (and encrypted). Yeah I agree that unless I know what I need to optimize it's hard to do it :) Right now I'm trying to find the bottleneck (if any.) I also agree that unless there is a substantial performance improvement the HTTP approach makes much more sense and I plan to stick to it unless someone can give a good reason not to. – Soumya Simanta Apr 05 '11 at 15:13
  • @Soumya Is it encrypted first then zipped? If so you aren't actually getting any benefits by zipping. See encrypted data can't be compressed easily because encrypting something essentially randomizes the data. In fact a good test for encryption is try and zip a data stream if it doesn't reduce in size that could mean it was encrypted. – chubbsondubs Apr 06 '11 at 02:28
  • @chubbard - it's encrypted first and then zipped. – Soumya Simanta Apr 06 '11 at 05:02
  • 3
    I bet you aren't getting any compression then. Try zipping first then encrypting it. I bet you'll see the size is much smaller if you do it that way. – chubbsondubs Apr 06 '11 at 17:24
2

No there is no significant overhead associated with using HTTP over raw sockets. However, it really depends on how you're using HttpClient to send this file. Are you properly buffering between the file system and HttpClient? The latency might not be the network, but reading the file from the filesystem. In fact you increased the raw link speed by 3x and only saw a reduction of 2x. That probably means there is some latency else where in your code or the server or filesystem. You might try uploading a file from a desktop client to make sure it's not the server causing the latency. Then look at the filesystem through put. If that all checks out then look at the code you've written using HttpClient and see if that could be optimized.

chubbsondubs
  • 37,646
  • 24
  • 106
  • 138
  • Can you please explain - Are you properly buffering between the file system and HttpClient? – Soumya Simanta Apr 05 '11 at 04:41
  • FileInputStream stream = new FileInputStream( file ) doesn't buffer data. So depending on how you're working with the stream this will goto the file system everytime you call read(). The solution is to wrap that with InputStream stream = new BufferedInputStream( new FileInputStream( file ) ); That way BufferedInputStream will make sure to grab 8K (default) and cache it to reads are faster and you make fewer trips between the filesystem and your program. – chubbsondubs Apr 06 '11 at 02:30
  • The same idea exists between FileInputStream and HTTPClient. Try and write sufficiently large enough buffers between the two to keep the throughput high. – chubbsondubs Apr 06 '11 at 02:31
  • I tried a simple program to just this file on Android and it took around 350 to 400 ms. I tried using both BufferedInputStream as well as just FileInputStream. Both report values in the same range. – Soumya Simanta Apr 06 '11 at 05:01
1

Note also in CherryPy 3.2 that the system for handling request bodies has been completely reworked, and you are much more free to implement varying handlers based on the media type of the request. By default, CherryPy will read your uploaded bytes into a temporary file; I assume your code then copies that to a more permanent location, which might be overhead that isn't useful to you (although there are good security reasons to use a temporary file). See also this question for discussion on renaming temp files.

You can override that behavior; make a subclass of _cpreqbody.Part with a make_file function that does what you want, then, in a Tool, replace cherrypy.request.body.part_class for that URI. Then post your code on http://tools.cherrypy.org so everyone can benefit :)

Community
  • 1
  • 1
fumanchu
  • 14,419
  • 6
  • 31
  • 36
  • Thanks for responding. I'm new to CherryPy as well as Python. In my initial runs I was reading the temp file and writing the contents to a new file. I guessed that it may be causing the problem, so I removed it. Now all I have is def upload_time(self, myFile): out = """ myFile uploaded
    """ return out upload_time.exposed = True
    – Soumya Simanta Apr 05 '11 at 14:58