1) You are given a project which turned out to produce code which can't fit to the memory restrictions of the environment. What could you do to handle this?
The question as asked says the "code" can't fit... which if taken literally (i.e. as being distinct from stack, heap and static/global data) means:
1) some of the executable code needs to be reduced in size and/or
2) the amount of code concurrently loaded needs to be reduced, and/or
3) the restrictions need to be lifted.
The question may or may not allow 3) - examining the restriction. If the interviewer's encouraging, start there as it's an easy win: e.g. can you recompile a 16-bit process as 32-bit, or specify a different executable layout to the compiler/linker in order to increase the supported executable size? Can you increase virtual address space by assigning some more hard disk to swap space without encountering excessive swapping?
To 1) reduce code size, start by measuring it: both as a total and on a per function basis. Check for compiler switches to optimise space over speed, and their impact. At the per-function level, focus on the largest functions, and any cases where a great many similar functions add up to a lot of executable code. Such functions might be generated by nested macros, or from some code-generation tool. Try to factor the common algorithm into one function that accepts customisations using function pointers or run-time switching.
Read over the most problematic functions for possible causes of bloat, such as potentially inline functions that they call a mass of time. While function calls can be a little worse than inlined code for very simple operations, repeated inlining of non-trivial functions can create arbitrarily worse code bloat. Preprocessor macro expansions inside functions are effectively inlining too.
For 2) reducing concurrently loaded code, you can use dynamically loaded libraries (e.g. dlopen
/dlsym
/dlclose
on Linux/UNIX). This grants you explicit control of the time windows in which code is mapped into the process address space. Alternatively, you can effectively serialise the need for parts of the code, e.g. have some one-off data generation run first, preparing data for a subsequent process to use. Or, depending on your OS and limitations, you may be able to run parts of the program concurrently on the same or different hosts, and have them communicate using IPC mechanisms such as shared memory or the network.
Of course, it's possible that "code" was used unwittingly in the question, and the real problem is heap exhaustion, or even stack exhaustion - there's different techniques for addressing them too but don't want to get sidetracked on speculation.
2) You are given a project which turned out to be performing slow than expected. How do you handle this?
Firstly, you need to identify the cause and type of the slowness:
- Is the project I/O bound, CPU bound, hard-coding delays between steps (it happens, particularly in screen scraping applications where a more scalable, robust approach wasn't worked out up front), suffering from scheduling delays etc..
- Then - what type of slowness is it - latency or throughput?
If the project is I/O bound, then a lot depends on the the device it's waiting on. You may get an easy win swapping a magnetic hard disk for an SSD drive, or adding RAM, or upgrading your network card. You may decide to change the way you do that type of I/O, such as moving from uncompressed to compressed data or vice versa, removing unnecessary flushing of a stream, creating indices at the front of files so you can then seek directly to needed data instead of doing a brute-force search, tuning Nagle parameters for TCP latency vs. throughput tradeoffs, using multicast rather than many point-to-point network connections, using binary rather than XML serialisation or data formats, using memory mapping in preference to looping read()
ing data into local buffers. You may be able to rethink how the I/O is done, e.g. using more efficient and direct in-memory indices into disk data, using high quality hash functions and sparser tables rather than binary searches through sorted data. These concepts - familiar in their in-memory equivalents (arrays versus hash tables) from basic Computing Science can have even worse consequences when I/O is involved.
If the project is CPU bound, then use a profiler to see which functions are accounting for most of the time. Manulaly add further CPU and/or elapsed time trace inside large functions until you understand the causes. How each problem can be fixed can vary enormously, but some generic things to check for optimisation:
- algorithmic efficiency (e.g. avoid unnecessary O(n^2) and worse operations for vaguely or arbitrarily large n)
- unnecessary data copying
- not recalculating values needlessly when the results of an earlier calculation could be easily saved and reused (e.g. compiling regular expressions then using them once, despite having an ongoing need for the exact-same regexp)
- physical memory page faulting generated by your algorithmic code (most relevant for huge matrices of values)
Also, scheduling can be a big issue. You may find atomic operations out-perform mutexes, or need to change your process priorities or CPU bindings, or make a fundamental decision such as using threads vs select
/poll
for handling multiple client connections.
The answers I could think of:
1) Using std library as much as possible because probably they are loaded anyway, Modularizing the code with functions to avoid rewriting duplicate code which might conserve the stack space.
Fair enough.
2) Introducing inline functions wherever necessary to reduce function lookup overhead, compiler optimizations may be(if not I'm using volatile)?
Definitely one to look at. In my measurements - across several OSes/CPUs/compilers etc - inlining trivial functions tends to be an order of magnitude faster than a function call, but for larger functions inlining can cause code bloat that - in heavily resource constrained environments - eventually leads to more cache missing and swapping etc.. So, it's not 100% black and white.
Please give as many possible solutions as possible :)
Well, that'd take all day.... :-P