Program size and program speed are not directly related.
I've done what you are doing - programming a microcontroller and wanting to make it as fast as possible.
First I get the program working, doing what is necessary.
Then I wrap a temporary loop around it, so it repeats its work endlessly, or for a long time.
While it is running, I manually interrupt it under a debugger or emulator, and then examine the call stack and any other variables as needed.
My object is to understand, in complete detail, why it was spending that moment in time.
I repeat this several times, like 10 times.
From those 10 samples, I look to see if there's any activity present on more than 1 of them that I could eliminate or make shorter.
For example, if I see on 4 of those samples that there is a function being called, from a particular place, and if I can think of a way to avoid that call most of the time, doing so would save about 40% of the execution time.
That's a speedup of 1/(1-0.4) = 1/0.6 = 1.67 = 67%, give or take.
That's a serious speedup.
You see, if there's anything you can do to make the code run faster, this technique will find it.
It is different from the usual advice to "measure, measure".
All measuring does is tell you that if a routine doesn't take much inclusive time, you should look elsewhere.
Measuring tells where you should not look, but,
in all but toy programs, it is very unspecific about where you should look.
(People sometimes take this as good news, as if failing to find a way to speed up their program implies there is none :)
The interrupting technique pinpoints problems.