4

I am trying to send some bytes to a third party application ( running on same server ) using tcp loopback connection using following code.

struct sockaddr_in serv_addr;
struct hostent *server;
int sockfd = socket(PF_INET, SOCK_STREAM, 0);
server = gethostbyname(host_address);

bzero((char *) &serv_addr, sizeof (serv_addr));
serv_addr.sin_family = AF_INET;

bcopy((char *) server->h_addr, (char *) &serv_addr.sin_addr.s_addr, server->h_length);

/**** Port No. Set   ****/
serv_addr.sin_port = htons(portno);
int sockKeepAliveOption = 1;
int al = setsockopt(sockfd, SOL_SOCKET, SO_KEEPALIVE, (void*) &sockKeepAliveOption, sizeof (sockKeepAliveOption));
if (al == -1) {
    std::cout << "Setsocket option err: SO_KEEPALIVE --unable to set keep alive tcp connection." << std::endl;
} 
else {
    std::cout << "S0_KEEPALIVE set, with SOL_SOCKET.. . ..\n" << std::endl;
}

I am sending 400 bytes at one time and sending 100 times in a second. I am using following code to send

int n = send(sockfd,sendB,400, ONLOAD_MSG_WARM); 

MY problem is, I am getting high jitter. I am getting minimum latency 3 us, avg 7 us and max 19 us. How can i optimized it?

Thanks

Edit on 8/28/2014.

Let me add few more information. I am also receiving data from same port in different thread but after I send. I am also assigning one core to each thread by following code and all cpu except core 0 are isolated from scheduler.

        thread1= new std::thread(myfunction, input1, input2);
        pthread_t thread_hnd = thread1->native_handle();
        CPU_SET(5, &cpuset);
        s = pthread_setaffinity_np(thread_hnd, sizeof (cpu_set_t), &cpuset);

I am getting good number (3 or 4 us) when I am sending continuously at every 1 ms but if frequency is less ( say 1-5 in a second ) then some time i get around 20 us but avg is around 7 us.

Can listening and sending on same port from different thread create jitter ?

2ND Edit on 8/28/2014.

Here is my cpu state. It is not going to C3. Core 2 [7] is the thread from where I am sending data through loop back.

 Cpu speed from cpuinfo 3499.00Mhz
 True Frequency (without accounting Turbo) 3499 MHz

 Socket [0] - [physical cores=6, logical cores=6, max online cores ever=6]
 CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
 TURBO ENABLED on 6 Cores, Hyper Threading OFF
 Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
 Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  38x/37x/36x/36x/36x/36x
 Real Current Frequency 3600.17 MHz (Max of below)
    Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp
    Core 1 [0]:       3600.17 (36.01x)      1.08    98.9       0       0    41
    Core 2 [1]:       3595.44 (35.96x)      1.07    98.9       0       0    46
    Core 3 [2]:       3595.28 (35.96x)         1    99.1       0       0    40
    Core 4 [3]:       3599.01 (36.00x)         1    99.9       0       0    46
    Core 5 [4]:       3599.51 (36.01x)         0     100       0       0    50
    Core 6 [5]:       3598.97 (36.00x)       100       0       0       0    56

  Socket [1] - [physical cores=6, logical cores=6, max online cores ever=6]
  CPU Multiplier 35x || Bus clock frequency (BCLK) 99.97 MHz
  TURBO ENABLED on 6 Cores, Hyper Threading OFF
  Max Frequency without considering Turbo 3598.97 MHz (99.97 x [36])
  Max TURBO Multiplier (if Enabled) with 1/2/3/4/5/6 cores is  38x/37x/36x/36x/36x/36x
  Real Current Frequency 3600.12 MHz (Max of below)
    Core [core-id]  :Actual Freq (Mult.)      C0%   Halt(C1)%  C3 %   C6 %  Temp
    Core 1 [6]:       3598.97 (36.00x)       100       0       0       0    56
    Core 2 [7]:       3598.51 (36.00x)      1.12    98.8       0       0    49
    Core 3 [8]:       3599.98 (36.01x)      1.94      98       0       0    45
    Core 4 [9]:       3598.97 (36.00x)       100       0       0       0    56
    Core 5 [10]:      3599.48 (36.01x)         1    99.9       0       0    48
    Core 6 [11]:      3600.12 (36.01x)      3.44    96.5       0       0    45

 C0 = Processor running without halting
 C1 = Processor running with halts (States >C0 are power saver)
 C3 = Cores running with PLL turned off and core cache turned off
 C6 = Everything in C3 + core state saved to last level cache
 Above values in table are in percentage over the last 1 sec
 [core-id] refers to core-id number in /proc/cpuinfo

2 Answers2

2

First of all, there are techniques to maybe speed this up, but that won't necessarily solve jitter. Most speed optimizations also rely on asynchronous socket handling and are mainly of help when receiving data, less when sending data.

What might help is setting the TCP_NODELAY option. This will make sure packets are sent out as quickly as possible, by disabling the Nagle algorithm. Essentially the Nagle algorithm tries to append multiple TCP buffers in a single packet to maximize throughput at the cost of latency/jitter.

Also, remember that timing on such low resolution is tricky at best. Double check your timer resolution (clock_getres) and keep in mind that any system interrupt and process scheduling can affect timing. Your actual jitter might be better than what you time.

KillianDS
  • 16,936
  • 4
  • 61
  • 70
1

Can you try sched_setaffinity(2) on your networking thread? If your code is single-threaded, it will be easier to use its wrapper taskset(1).

Moreover, it would be best to boot Linux with the isolcpus parameter so that other irrelevant processes will not bother your experiment.

Update on C State

Is it possible that your CPU is sleeping too deeply (>= C3) ?

This tool might be helpful in monitoring the C state:

You might want to tweak the intel_idle.max_cstate kernel parameter or something similar, depending on your CPU and kernel version.

Community
  • 1
  • 1
nodakai
  • 7,773
  • 3
  • 30
  • 60
  • Yeah, I am booting my cpu with isolcpus. Core 0 is running all system process. I have assigned one core per thread. – prashant singh Aug 28 '14 at 04:32
  • @prashantsingh I read your update on the high latency in the case of long interval and added some pieces of info about Intel CPU sleeping state. – nodakai Aug 28 '14 at 05:38
  • This is very good piece of information. Thanks a lot. I will go through it and get back to you. I think, it is very unlikely that CPU is sleeping too deeply. I am parsing tick by tick data in the same thread and data is coming every 10-20 us. – prashant singh Aug 28 '14 at 07:07
  • I have posted i7z results. It look like mostly it is in C1 state ( when it is waiting for data). Is it fine or should I disable c-states ? – prashant singh Aug 28 '14 at 07:48
  • 1
    What i7z reports as "C1" can actually be either C1 (in a narrower sense) or C1E, and if it's C1E, it *might* be doing harm to you. This file contains a table of **typical** latency of each C states https://github.com/torvalds/linux/blob/master/drivers/idle/intel_idle.c You might as well try googling for "bios disable c1e" or something... If you still observe a high latency and jitter, it's worth trying busy looping with a non-blocking socket. – nodakai Aug 28 '14 at 08:08