-1

I am using small embedded PCIe based hardware, which is having a very low-end processor, based on the operation and firmware size, I am consuming all of its resources. To do CPU optimization what are the steps I can do?

I have tried to move multiplication(*) and divisions(/) using bitwise. But I have codes like this

Y = X * 3;

or

Z = X / 1000;

getting confused how to do these in CPU friendly ways.

Dipankar
  • 141
  • 2
  • 9
  • 8
    It is pretty pointless to answer this in a general way, it depends on the specific CPU. Generally, it is best to write the code as clear as possible and then let the compiler worry about optimization. – Lundin Dec 06 '17 at 07:36
  • 4
    Unless you are using an incompetent compiler, these kinds of micro-optimizations are already being done for you. – sp2danny Dec 06 '17 at 07:50
  • gcc is quite smart when using division by a integer constant: [Why does GCC use multiplication by a strange number in implementing integer division?](https://stackoverflow.com/questions/41183935/why-does-gcc-use-multiplication-by-a-strange-number-in-implementing-integer-divi) – Lanting Dec 06 '17 at 08:07
  • 1
    Far too broad currently. For example, what are the types of X, Y, and Z? – Bathsheba Dec 06 '17 at 08:18
  • X Y Z all are integer, using gcc. I jusy thought it in this way, Y = X*3; // Y = (X<<1) +X; But I am not sure, what I am achieving after this much changes. – Dipankar Dec 06 '17 at 08:52
  • 1
    In any case you need to look at the generated assembler code to verify whether you gained anything. – Gerhardh Dec 06 '17 at 09:09
  • you could also look at the generated assembly code for the original commands to see how you could possibly improve – Ora Dec 06 '17 at 14:46

1 Answers1

1

1. Confirm your bottlenecks

There are CPU-bound, memory-bound, IO-bound applications etc. Your low-end processor in fact may spend most of the time waiting for the data from DRAM, doing some IO, or waiting for a spinlock. So the first thing you do is to confirm your real bottleneck.

There are tools for this, like free perf for Linux or paid Intel VTune.

2. Show us the context

If you found that most of the time your CPU spends on foo(), show us this function so we could help.

3. Generic suggestions

For your generic question, you will get just generic suggestions, like:

  1. Use more aggressive compiler optimization, like -O3
  2. Change your algorithms.
  3. Avoid locks.
  4. Align your data.
  5. Avoid false sharing.
  6. Make your data structures more compact.
  7. Use prefetch.

etc etc

Sorry, there is no context to suggest you a more specific technique.

Community
  • 1
  • 1
Andriy Berestovskyy
  • 8,059
  • 3
  • 17
  • 33
  • Thanks for your suggestion Andriy. I am already using -O3 also it is confirmed that CPU is getting exhausted. – Dipankar Dec 11 '17 at 17:39
  • @dipankar sorry, the only way to optimize x*3 is x+x+x, which is not always faster. You should show us more code to get more practical suggestions. – Andriy Berestovskyy Dec 12 '17 at 09:16