C code - memory access / preemption

Question

I have written a piece of code wherein a data:

unsigned char buf[4096]; // data in chunks of size 4k
unsigned counter[256];

I am adding up the i/p data for every 3 contiguous bytes and storing the ans. ex: temp[4096]; temp[0] = buf[0] + buf[1] + buf[2]; ... till 4096

Then a histogram is generated from the results of temp using the code:

for(i = 0; i < 4096; i++)
counter[temp[i]]++;

The histogram is sorted (bubble sort) and then top 8 most recurring values are taken. The code is run in the linux kernel (2.6.35)

The problem I am facing is that if I remove the sorting part, the time taken to execute the code is very fast (6 microsec on my laptop, measured using gettimeofday func). But after introducing the sorting, the process slows down to a great extent (44 microsec). The sorting function itself takes 20 microsecs, I cant understand why is the time then increasing so much. I did a memory analysis using cachegrind, the results are normal and I even tried disabling preemption ubut still it doesnt show any difference. If anybody can help me out over here. Thanks!

To get 8 top values you need no to do full sort. E.g. Heapsort can be used for getting N top values (if implemented so) and it will be faster than full sort. — osgx, Jul 05 '11 at 13:48
Or even a Selection sort http://en.wikipedia.org/wiki/Selection_sort - which can be stopped after getting top 8 values. — osgx, Jul 05 '11 at 13:53
What part of your code occupies the other 18 microsecs when you introduce sorting? Are you using dynamic memory somehow (`malloc` and friends)? ??? — pmg, Jul 05 '11 at 13:56
I have created a structure for sort which contains the value and its counter i.e. no. of times it appeared. the loop for sorting is being run 8 x 256 times. — randy7, Jul 05 '11 at 14:22
@pmg: thats what i am unable to figure out, the generation of histogram takes 6 microsec but if i introduce sorting, it starts taking 24 microsec, the same piece of code...I am not using malloc... — randy7, Jul 05 '11 at 14:24
Just to be certain ... you have: 6 microsecs for the base time; 20 microsecs for sorting; and 18 microsecs "lost" (total 44 microsecs). Your question is where are those 18 microsecs "lost", right? — pmg, Jul 05 '11 at 14:34
@pmg: yes, I cant understand where those 18 microsecs are lost and I am sorting only once. — randy7, Jul 05 '11 at 14:36

Patrick Schlüter · Accepted Answer · 2018-09-10T14:59:38.893

2

Bubble sort is slow, it compares and swaps your values up to 4096*4096 = 16,777,216 times. If you need only the 8 best values, a 1 sweep selection is certainly faster. Something like that.

 const uint_t n = 8;
 uint_t best[n] = {0};
 uint_t index[n] = {0};
 uint_t j;

 for(uint_t i=0; i<4096; i++) {

   if(counter[i] > best[n-1]) {
     for(j=n-2; j && counter[i] > best[j]; j--);           /* Find the insertion position, as our value might be bigger than the value at position n-1. */
     memmove(&best [j+1], &best[j] , (n-1 -j) * sizeof best[0]);      /* Shift the values beyond j up 1  */
     memmove(&index[j+1], &index[j], (n-1 -j) * sizeof index[0]);
     best[j] = counter[i];                                 /* Put the current best value at the top */
     index[j] = i;                                         /* Store the index in the second array to know where the best value was. */
   }
 }

With that, you compare your values only once and the cost of the memmove is negligible because your selection array is small. No need to sort the array, this algo is O(nm) with n the size of your array and m the size of your selection. The best sort would be O((n.log2 n).m). So if m is small and n is big, it is unbeatable by any generic sort algorithm.

EDIT: I added the array for the index.

EDIT2: Introduced second to correct the fundamental bug I had in first instance.

EDIT3: Comment: memmove with size 0 is allowed and is basically a nop.

edited Sep 10 '18 at 14:59

answered Jul 05 '11 at 14:02

Patrick Schlüter

11,394
1
43
48

your code seems to be a bit wrong. What sorting method did you implement? – osgx Jul 05 '11 at 14:05
1

There is no sorting method, I select the highest value from the counter array. Of course if you want to know which index of `counter` is highest, you have to store it also in an array that you `memmove`synchronisly with that one. – Patrick Schlüter Jul 05 '11 at 14:09
1

If you put `n` as 4096 this would be a selection sort. The complexity of a normal selection sort is O(n^2), but as we limit the `memmove` to a small constant value we get O(8n) = O(n). – Patrick Schlüter Jul 05 '11 at 14:18
The only problem I see here is if you encounter a really big value early ... say you encounter a histogram value that is the largest histogram value first. Now you will never see the other seven values less than the initial (largest) histogram value since `if (counter[i] > best[0])` will always be false. So in that case you would only end up with the largest histogram value, rather than the largest eight histogram values ... the rest of the `best` array would be zeros. – Jason Jul 05 '11 at 14:24
Yeah, you're right, I copied it from my app and forgot the crucial part. Sorry, I correct it immediatly. – Patrick Schlüter Jul 05 '11 at 14:27
Done, and in fact it's an insertion sort not a selection as I said above. – Patrick Schlüter Jul 05 '11 at 14:41
This works nicely for a small selection of sorted values since you are basically doing an insertion sort on 8 values, but if you needed something like the top 15 values, you would at that point be doing more checks that with an O(N log N) algorithm for 4096 values. In fact it would remain that way up to around 32K values. So this is a nice effective way to get the top (Log2(N) - 1) values. After that, an O(N Log N) algorithm would be more efficient. – Jason Jul 05 '11 at 15:00
i only have to sort an array of size 256. Your solution is very good but there are a few errors: in your for loop for determining j, when counter[i] !> best[j], you want to shift the data after the current j. The solution works fine but still the code has unaccounted time of 18 microsecs. – randy7 Jul 05 '11 at 18:11

score 1 · Answer 2 · answered Jul 05 '11 at 13:56

Bubble sort is slow ... O(N^2) complexity ... if you want faster performance, use a data-structure like a heap, or run the quick-sort algorithm on your array, both of which will give you O(N log N) complexity for the sorting process. In addition, both methods will also work nicely on fixed-length arrays.

C code - memory access / preemption

2 Answers2