I have tried a vanila enc-dec arch as following (english to french NMT)
I want to know how to integrate keras attention layer here. Either from the keras docs or any other attention module from third party repo is also welcome. I just need to integrate it and see how it works and finetune it.
Full code is available here.
Not showing any code in this post because it's large and complex.