I recently had an interview with NetApp for a C++ role (they do big data storage systems). I wrote some code to answer an interview question. Their response was “You failed”. It was very difficult to get feedback, as it usually is after failing an interview. After some very polite begging for feedback I got a little bit. But it still didn’t quite make sense.
Here’s the problem:
Given a bunch of files in a directory, read them all and count the words. Create a bunch of threads to read the files in parallel.
The consensus at NetApp (people who know a lot about storage) is that it should get faster with more threads. I think in most circumstances you are so I/O bound that it will get slower after 1 or 2. I just don’t see how it’s possible to get faster unless you are under some know special circumstances (like SAN or maybe RAID arrays) Even in those cases the number of sequential channels to the disk saturates and you are I/O bound again after only a few threads.
I think my code was great (of course). I’ve been writing C++ for many years. I think I know some things about what makes good code. It should have passed on style alone. Hehe. As a general rule, performance optimizations are not something you should guess at, they should be tested and measured. I only had limited time to run experiments. But now I’m curious.
The code is in my GitHub account here:
https://github.com/MenaceSan/CountTextWords
Anyone have any opinions on this? Shed some light on what they might have been thinking? Any other criticisms of the code?
I base part of my opinion on this: