Please excuse the broadness of this question. Maybe once I know more perhaps I can ask more specifically.
I have performance sensitive piece of tensorflow code. From the perspective of someone who knows little about gpu programming, I would like to know what guides or strategies would be a "good place to start" to optimizing my code. (single gpu)
Perhaps even a readout of how long was spent on each tensorflow op would be nice...
I have a vague understanding that
- Some operations go faster when assigned to a cpu rather than a gpu, but it's not clear which
- There is a piece of google software called "EEG" that I read about in a
paper that may one day be open sourced.
There may also be other common factors at play that I am not aware of..