Training agent using historical data in TF-agents

Asked May 02 '22 at 15:59

Active May 02 '22 at 16:44

Viewed 165 times

I am using contextual bandits algorithm in TF_agents. Is there a way to train the agent using historical data (context, action, reward) in table, instead of using the replay buffer ?

The environment provides context and reward. Therefore I cam make the environment provide these from the table. But the action is provided by the agent. I am not sure how to override the action provide by the agent (on a specific context) with the action in historical table data.

I am using a custom environment, and a prebuilt agent (LinearThompsonSampling - Bandit agent). Not quite sure if I can use the LinearThompson sampling inbuilt agent and at the same time, provide actions based on the historical data for training. Couldn't find any examples in the tf_agents documentation

edited May 02 '22 at 16:44

asked May 02 '22 at 15:59

tjt

@FedericoMalerba appreciate if you can provide any thoughts on this. Thank you – tjt May 02 '22 at 16:00
Were you able to implement using TF agents ? I am also trying a similar approach as you .Any guidance will be helpful – Shubh Nov 24 '22 at 05:07

Training agent using historical data in TF-agents

0 Answers0

Linked