9

I have scoured the interwebz with no result. We are facing a problem where some Android devices experience severe packet loss. To give some background, the application connects to a specific Wifi and looks for UDP packets broadcast on port 17216. These packets are of size 832 bytes, excluding the wrapped headers, and are sent at a regular rate of four per second.

We have only met the problem on two devices, a low-end Turbox Rubik II tablet and an ASUS Memo Pad HD 7. The other devices we've tested (phones and tablets) all gather the packets at the stipulated regular interval.

The function that receives the packets is this:

public void run()
{
    while (isUDPServerRunning)
    {
        try
        {
            socket.receive(packet);

            ProcessRawPacketData();

            DisplayLoggingInfo();

        }
        catch (IOException e)
        {
            Log.e("receive", e.getMessage());
            e.printStackTrace();
        }
    }
}

And that is part of a Runnable. The socket is created thus:

byte[] buffer = new byte[1024];

DatagramSocket socket;
DatagramPacket packet = new DatagramPacket(buffer, buffer.length);

with the socket being initialized in the onCreate() method of our Service extension:

socket = new DatagramSocket(SERVERPORT);

The packets are being received by the Wifi module. We've confirmed that by rooting one of the devices and installing a packet sniffer, so the problem must somehow be code related.

On the affected devices packets are received correctly for a couple of seconds and then there is complete dropout that lasts for several seconds, so I estimate the loss to exceed 50%.

Any help would be much appreciated. We are pulling our hair out.

Update I was mistaken about the packet sniffer. It seems that the packet sniffer is also losing several relevant packets on the rooted device. Sometimes, though, simply starting the packet sniffer fixes the issue! Turning Bluetooth on/off like suggested below does not seem to make a difference. Could this be another hardware issue?

Update 2 Here is an example of the logs I'm printing immediately after the socket.receive() line. Notice how it skips half a minute's worth of packets and then works fine for a few seconds.

05-25 15:44:38.670: D/LOG(4393): Packet Received
05-25 15:44:38.941: D/LOG(4393): Packet Received
05-25 15:45:09.482: D/LOG(4393): Packet Received
05-25 15:45:09.716: D/LOG(4393): Packet Received
05-25 15:45:09.928: D/LOG(4393): Packet Received
05-25 15:45:10.184: D/LOG(4393): Packet Received
05-25 15:45:10.451: D/LOG(4393): Packet Received
05-25 15:45:10.661: D/LOG(4393): Packet Received
Kristian D'Amato
  • 3,996
  • 9
  • 45
  • 69

4 Answers4

5

Packet loss (as you know, of course) can happen at multiple stages along the transmission:

  1. Sending from the server
  2. Transmission over the network
  3. Physical reception at the client and handling in hardware
  4. Processing/buffering of the packet in the kernel/OS
  5. Handling/buffering of the packet in your app.

You can quickly check whether point 1 or 2 are an issue by having other devices listen for the same broadcast while being connected to the same Wifi router. Sounds like you already did this and that there is no issue. (Note that a packet that gets dropped in step 2 (or sometimes even 1) might not be missing from the WireShark dump if you run it on the server.)

Points 3 through 5 are therefore likely to be the problem and they might be a little harder to separate out.

Here are a couple of things that might help:

  • Like @Mick suggested, don't just print out when you received the packet, but give every packet an increasing ID number to figure out whether you actually lost a packet or whether it was just delayed.
  • Move your packet-receiving code into its own thread (if it isn't already) and set the priority of that thread to MAX_PRIORITY to minimize the chance that your code is holding up the lunch line. Given that the Memo Pad is a quad-core 1.2GHz machine, MAX_PRIORITY shouldn't even be necessary, but if you aren't currently running the receive-loop in its own dedicated thread, you might see hick-ups anyways. If this fixes things, simply have a minimal receive-loop stick the packets into your own buffer-queue and have an independent thread process them.
  • Check/increase the size of the packet buffer for receiving packets via setReceiveBufferSize(...) (more verbose Java reference here). Make sure you specify a size that can hold many packets. Given that running the packet-sniffer sometimes seems to help things, it does sound like there might be some socket setting that can improve things, which the sniffer happens to set.
  • On the server you can also add a tag to the packet that tells all involved devices how to treat the packet. If you call setTrafficClass(IPTOS_RELIABILITY), you are asking everyone involved to optimize their packet handling for maximum reliability. Not all devices will care, but it may make a difference.
  • You can try to use DatagramChannels instead of DatagramSockets and then use select() to wait for the next packet to read. While this technically should not make a difference, sometimes using a different API call can provide a work-around for an issue.
  • Unfortunately Android is a very heterogeneous environment where many manufacturers will provide their own kernel modules, etc. This also introduces various incompatibilities or non-standard behavior everywhere. You might be able to find a custom ROM (Cyanogen, etc.?) for one or both of your problem-devices. If installing that instead of the factory ROM fixes your problem, then it's a bug in the manufacturer provided (kernel) network drivers, in which case, you might get lucky to find a work-around, or you could maybe file a bug-report with them, but in general, you might just have to select those devices as unsupported in the Play Store to avoid bad reviews...

Finally, here is a work-around that should fix the issue for sure:

Add some code to your client that detects dropped packets and, if the drop-rate goes too high, opens a TCP connection to the server instead, which will then guarantee packet delivery. Given that your packets are small and infrequent and that only a few devices will ever need to use this mechanism, I don't think that this should cause a problem for your server load. If you don't have a way to change the server code to provide a TCP stream, you could write an independent proxy-server that collects the UDP packets and makes them available via TCP. If you can run it on the same machine as the original server, you even know what IP address it is at (the same as the source address of the UDP packets that did arrive).

Markus A.
  • 12,349
  • 8
  • 52
  • 116
  • i am experiencing the same issue could you take a look at http://stackoverflow.com/questions/38891610/packet-loss-while-receiving-udp-broadcast-in-android would be a great help – George Thomas Aug 11 '16 at 12:39
1

Just a wild guess, but how long do your computations on the packet take? Is it possible that the allocated buffer for the socket fills up and starts to drop the packages?

I know, this sounds unlikely for a transfer rate at about 4 KB/s... But if your computations take longer than 250 ms than this would occur sooner or later. This would also explain why some devices work like a charm, and others don't.

Have you tried to remove the computations and just print the "package received" message for debugging?

bratkartoffel
  • 1,127
  • 1
  • 13
  • 37
  • if two packets are sent at the same time one misses, can you please check this http://stackoverflow.com/questions/38891610/packet-loss-while-receiving-udp-broadcast-in-android – George Thomas Aug 11 '16 at 12:41
1

Interestingly enough, both of the devices that are experiencing UDP packet loss happen to have Mediatek SoCs. Do your other test devices have this same chipset?

This may be a bug in the driver for the Wi-Fi of those SoCs. Being that it only shows up with UDP, and isn't always 100%, it may have been unnoticed by everyone until now.

CenterOrbit
  • 6,446
  • 1
  • 28
  • 34
0

This sounds very similar to Bluetooth interference symptoms that can be seen on Android (and iOS - in fact anything with WiFi and Bluetooth together) devices.

2.4Ghz WiFi and Bluetooth share the same bandwidth and can interfere with each other - on some devices this is vey pronounced, maybe due to the internal layout.

It is also possible that you can see it on some devices and not others because of the versions of WiFi they support - the newer 5GHz based wifi does not interfere with bluetooth in the same way, but some older or more basic Android devices may not support this.

You can test if this is the cause quite easily by switching off bluetooth on the device while testing (if your app can function without bluetooth).

Mick
  • 24,231
  • 1
  • 54
  • 120
  • Thanks! This sounded like a good lead, but switching bluetooth off/on had no effect. Also, the packet sniffer confirmed that the packets were being received, so I don't think this was the cause. – Kristian D'Amato May 25 '15 at 08:18
  • @KristianD'Amato can you share logs from runs where packets are lost? – Mick May 25 '15 at 10:55
  • @KristianD'Amato - assuming there are no other logs (errors etc) then it really does look like you are either losing the packets in the network or the sender is not sending them for some reason. The only other thing that would usually cause packet loss on the receiving end would be if your processing took so long that the receiving buffer was overflowing, but that seems unlikely given your small packet size. BTW, have you checked that the packets are actually being lost and not simply being delayed - i.e. not just a gap in the logs but also checked the packets content shows missing packets? – Mick May 25 '15 at 14:23
  • I have run Wireshark on a PC simultaneously so I can confirm that the packets are being sent succesfully from the server. It is also not a processing problem, because packet sniffers on the affected device also display the problem (see the update above). – Kristian D'Amato May 25 '15 at 14:37
  • Didn't deserve a downvote, this was a good suggestion. I upvote to compensate. – CenterOrbit Jun 02 '15 at 21:47