1. Confirm your bottlenecks
There are CPU-bound, memory-bound, IO-bound applications etc. Your low-end processor in fact may spend most of the time waiting for the data from DRAM, doing some IO, or waiting for a spinlock. So the first thing you do is to confirm your real bottleneck.
There are tools for this, like free perf
for Linux or paid Intel VTune.
2. Show us the context
If you found that most of the time your CPU spends on foo(), show us this function so we could help.
3. Generic suggestions
For your generic question, you will get just generic suggestions, like:
- Use more aggressive compiler optimization, like
-O3
- Change your algorithms.
- Avoid locks.
- Align your data.
- Avoid false sharing.
- Make your data structures more compact.
- Use prefetch.
etc etc
Sorry, there is no context to suggest you a more specific technique.