The tensorflow webpage explains clearly how to self-compile tensorflow. This stackoverflow post explains how this can be done with SSE4.2 and AVX instruction support. But I cannot find instructions on how to do this within an isolated conda environment.
So my question is: How do I do this within an isolated conda environment so that it is available for keras?
Side question: how much speedup can I expect from including these instruction sets?
Update: I figured out the answer to this question and posted the answer here: How to compile Tensorflow with SSE4.2 and AVX instructions?