How do you handle memory limitations & performance improvements of a process.

Question

This actually contains 2 questions.

1) You are given a project which turned out to produce code which can't fit to the memory restrictions of the environment. What could you do to handle this?

2) You are given a project which turned out to be performing slow than expected. How do you handle this?

The answers I could think of:

1) Using std library as much as possible because probably they are loaded anyway, Modularizing the code with functions to avoid rewriting duplicate code which might conserve the stack space.

2) Introducing inline functions wherever necessary to reduce function lookup overhead, compiler optimizations may be(if not I'm using volatile)?

Please give as many possible solutions as possible :)

score 7 · Accepted Answer · edited May 23 '17 at 09:59

You are given a project which turned out to produce code which can't fit to the memory restrictions of the environment. What could you do to handle this?

Identify how much code footprint is permitted and then apply various code refactoring methods to reduce the current footprint to fit the permitted size.

You are given a project which turned out to be performing slow than expected. How do you handle this?

Profile the project to find performance bottlenecks, Within the identified bottlenecks, identify the 20% of the total code that runs 90% of the exectuion time of the project and then target that code for optimization.

Very interesting to hear the 90-20 rule. For more: http://en.wikipedia.org/wiki/Pareto_principle — Murat Derya Özen, Feb 29 '12 at 09:30

Luchian Grigore · Answer 2 · 2012-02-29T08:26:05.280

4

1) Dynamically load and unload modules as you need them.

Under Windows, you can use LoadLibrary and GetProcAddress instead of actually linking to a lib file. Linking causes the dll to load into memory even if not used. With LoadLibrary you can load it when you need it, so it doesn't take up memory, and then unload it when done.

For Linux, check out this link, section "Dynamic loading and un-loading of shared libraries using libdl".

2) Profile, of course. Unless the interviewer specifically asks for micro-optimization tricks, this is the correct answer.

edited Feb 29 '12 at 08:26

answered Feb 29 '12 at 07:35

Luchian Grigore

253,575
64
457
625

`Under Windows, you can use LoadLibrary and GetProcAddress instead of actually linking to a lib file. Linking causes the dll to load into memory even if not used.` Any equivalent in Linux? – Thunderman Feb 29 '12 at 08:23
@Thunderman I found something, might help. I edited my answer. – Luchian Grigore Feb 29 '12 at 08:26
@Luchigan Grigore Thanks..It helped. – Thunderman Feb 29 '12 at 08:28

score 1 · Answer 3 · answered Feb 29 '12 at 10:59

1) You are given a project which turned out to produce code which can't fit to the memory restrictions of the environment. What could you do to handle this?

The question as asked says the "code" can't fit... which if taken literally (i.e. as being distinct from stack, heap and static/global data) means:

1) some of the executable code needs to be reduced in size and/or 2) the amount of code concurrently loaded needs to be reduced, and/or 3) the restrictions need to be lifted.

The question may or may not allow 3) - examining the restriction. If the interviewer's encouraging, start there as it's an easy win: e.g. can you recompile a 16-bit process as 32-bit, or specify a different executable layout to the compiler/linker in order to increase the supported executable size? Can you increase virtual address space by assigning some more hard disk to swap space without encountering excessive swapping?

To 1) reduce code size, start by measuring it: both as a total and on a per function basis. Check for compiler switches to optimise space over speed, and their impact. At the per-function level, focus on the largest functions, and any cases where a great many similar functions add up to a lot of executable code. Such functions might be generated by nested macros, or from some code-generation tool. Try to factor the common algorithm into one function that accepts customisations using function pointers or run-time switching.

Read over the most problematic functions for possible causes of bloat, such as potentially inline functions that they call a mass of time. While function calls can be a little worse than inlined code for very simple operations, repeated inlining of non-trivial functions can create arbitrarily worse code bloat. Preprocessor macro expansions inside functions are effectively inlining too.

For 2) reducing concurrently loaded code, you can use dynamically loaded libraries (e.g. dlopen/dlsym/dlclose on Linux/UNIX). This grants you explicit control of the time windows in which code is mapped into the process address space. Alternatively, you can effectively serialise the need for parts of the code, e.g. have some one-off data generation run first, preparing data for a subsequent process to use. Or, depending on your OS and limitations, you may be able to run parts of the program concurrently on the same or different hosts, and have them communicate using IPC mechanisms such as shared memory or the network.

Of course, it's possible that "code" was used unwittingly in the question, and the real problem is heap exhaustion, or even stack exhaustion - there's different techniques for addressing them too but don't want to get sidetracked on speculation.

2) You are given a project which turned out to be performing slow than expected. How do you handle this?

Firstly, you need to identify the cause and type of the slowness:

Is the project I/O bound, CPU bound, hard-coding delays between steps (it happens, particularly in screen scraping applications where a more scalable, robust approach wasn't worked out up front), suffering from scheduling delays etc..
Then - what type of slowness is it - latency or throughput?

If the project is I/O bound, then a lot depends on the the device it's waiting on. You may get an easy win swapping a magnetic hard disk for an SSD drive, or adding RAM, or upgrading your network card. You may decide to change the way you do that type of I/O, such as moving from uncompressed to compressed data or vice versa, removing unnecessary flushing of a stream, creating indices at the front of files so you can then seek directly to needed data instead of doing a brute-force search, tuning Nagle parameters for TCP latency vs. throughput tradeoffs, using multicast rather than many point-to-point network connections, using binary rather than XML serialisation or data formats, using memory mapping in preference to looping read()ing data into local buffers. You may be able to rethink how the I/O is done, e.g. using more efficient and direct in-memory indices into disk data, using high quality hash functions and sparser tables rather than binary searches through sorted data. These concepts - familiar in their in-memory equivalents (arrays versus hash tables) from basic Computing Science can have even worse consequences when I/O is involved.

If the project is CPU bound, then use a profiler to see which functions are accounting for most of the time. Manulaly add further CPU and/or elapsed time trace inside large functions until you understand the causes. How each problem can be fixed can vary enormously, but some generic things to check for optimisation:

algorithmic efficiency (e.g. avoid unnecessary O(n^2) and worse operations for vaguely or arbitrarily large n)
unnecessary data copying
not recalculating values needlessly when the results of an earlier calculation could be easily saved and reused (e.g. compiling regular expressions then using them once, despite having an ongoing need for the exact-same regexp)
physical memory page faulting generated by your algorithmic code (most relevant for huge matrices of values)

Also, scheduling can be a big issue. You may find atomic operations out-perform mutexes, or need to change your process priorities or CPU bindings, or make a fundamental decision such as using threads vs select/poll for handling multiple client connections.

The answers I could think of: 1) Using std library as much as possible because probably they are loaded anyway, Modularizing the code with functions to avoid rewriting duplicate code which might conserve the stack space.

Fair enough.

2) Introducing inline functions wherever necessary to reduce function lookup overhead, compiler optimizations may be(if not I'm using volatile)?

Definitely one to look at. In my measurements - across several OSes/CPUs/compilers etc - inlining trivial functions tends to be an order of magnitude faster than a function call, but for larger functions inlining can cause code bloat that - in heavily resource constrained environments - eventually leads to more cache missing and swapping etc.. So, it's not 100% black and white.

Please give as many possible solutions as possible :)

Well, that'd take all day.... :-P

score 0 · Answer 4 · answered Feb 29 '12 at 07:36

Make sure you have COMDAT folding enabled, DCE and unused function elimination is a must. Eliminated unneeded dependencies, use code coverage tools and refactor to avoid duplication. Running a memory profiler would also help, then you can spot areas that need fixing the most. Most compilers also give an option to optimize for size instead of speed.
Use a profiler, so you know why its not performing correctly instead of just guessing widly.

score 0 · Answer 5 · answered Feb 29 '12 at 16:31

The first thing I'd do is ask "How much?" -- how much too large is it? How much too slow is it? The second thing I'd look into is what (if anything) has already been done to optimize the code to hit the targets.

On one hand, if nothing has been done to hit the targets and you're over by a couple of percent, chances are pretty good that some fairly simple work on compiler flags will be adequate to do the job. If your code is reasonably portable, another possibility would be to just use a different compiler (e.g., if you're developing for Windows using Microsoft's compiler, chances are pretty good that you can gain a few percent in both size and speed just by switching to Intel's compiler).

On the other hand, if you're too large by a factor of 15, too slow by a factor of 10, and there's already been quite a bit of work put into optimizing to try to get closer to the targets, it could well be that the targets just aren't reasonably attainable, and you need to negotiate with the customer about changing the target -- removing some features, using a target system that's faster and has more storage, etc. The simple fact is that unless something was done really badly to start with, there's not much chance of simultaneously making code 15 times smaller and 10 times faster while retaining the same features.

I would add that you should normally ensure against the latter situation from the beginning. If you're targeting a system with reasonably hard limits on RAM, CPU speed, etc., you normally want to budget those out from the beginning. If you end up with a budget that says you need to do an FFT on a million data points using only 10K of RAM and still finish in 1 nanosecond, you probably don't need to write any code at all to figure out that it's just not going to happen. If size and speed are important, you should never get to the point of a finished product that's badly out of touch with requirements.

Of course, a moving target is also possible -- if you develop something initially for a desktop machine, but at the end the client says something like: "Oops, we left a word out there -- we meant Windows Phone 7". Again, you'll have to talk priorities, though in this case it may be about whether you can move the ship date and start development over from (close to) the beginning.

How do you handle memory limitations & performance improvements of a process.

5 Answers5