Here I asked how to solve overhead problem by using while_loop for training (which allow to evaluate train_op several time by call only one run). After that I create 4 thread and run one while_loop per thread for optimization in parallel. Is there native mechanism in TensorFlow for such parallel optimization?
I use Ftrl optimizer.
Thanks!
EDITED:
In my situation I have big data set, which I read gradually in main thread and enqueue to FIFOQueue. I use batch optimization and one optimization step on small batch (ideal only one element) takes little time (I use linear model), since that I want to do all optimization step in one run call, without returning to python interpreter on each step (because overhead problem). Now I call run as many times as number of threads.