I have used Haptek in the past but is now defunct. To see what I want to do: ejTalk Cassandra
The idea is to send a text string with as "text-to-say(with ssml):avatar-emotion:avatar-gesture" I will adapt to any sort of markup. The ejTalk engine manages all the ASR/NLP/Dialog/etc. What I want is JUST the talking head.
It can be browser based, or C++ linkable library, or stand alone server but running on Windows 10/11.
I have coded in C++, Javascript, etc. for decades so I don't scare easily.
I am looking into Unreal and Unity engines but they seem like heavy platforms and may not lend themselves to being driven by text strings from another server.