Questions tagged [tf-agent]

43 questions
6
votes
1 answer

why is data from tf-agents buffer in random order

tl-dr version: why do the first 2 action/observations i take not line up with my first two objects in my replay buffer? Do tf-agent replay buffers automatically shuffle data around? by adding these prints im able to see what my first 2 steps look…
tgm_learn
  • 61
  • 7
2
votes
1 answer

Why is my DQN-agent's training so inefficient?

I am trying to train an agent to play tic-tac-toe perfectly as a second player (the first player walks randomly) with the DQN-agent from tf-agents, but my training is extremely slow. For 100_000 steps, the model did not improve its results in any…
2
votes
2 answers

Tf-agent Actor/Learner: TFUniform ReplayBuffer dimensionality issue - invalid shape of Replay Buffer vs. Actor update

I try to adapt the this tf-agents actor<->learner DQN Atari Pong example to my windows machine using a TFUniformReplayBuffer instead of the ReverbReplayBuffer which only works on linux machine but I face a dimensional issue. [...] ---> 67…
2
votes
0 answers

Tensorflow, PyEnvironment: Given `time_step` does not match expected `time_step_spec`

I'm trying to set up a custom PyEnvironment and I get Given 'time_step' does not match expected 'time_step_spec error. I don't see where the the dtype specification is missing. Here's the environment: class TicTacToe(py_environment.PyEnvironment): …
flowerboy
  • 81
  • 7
2
votes
1 answer

Error while importing tf_agents in google colab

from __future__ import absolute_import from __future__ import division from __future__ import print_function import abc import tensorflow as tf import numpy as np import pandas as pd from tf_agents.environments import py_environment from…
2
votes
1 answer

py_environment 'time_step' doesn't match 'time_step_spec' - but I can't spot the difference

I'm trying to create a custom tf-agents environment for trading. When I try to validate it by calling utils.validate_py_environment(environment, episodes=1), I'm getting a ValueError 'time_step' doesn't match 'time_step_spec' . I've been trying to…
Top Snek
  • 71
  • 5
2
votes
1 answer

How to pass the batchsize for a custom environment in Tf-agents

I am using tf-agents library to build a contextual bandit. For this I am building a custom environment. I am creating a banditpyenvironment and wrapping it in the TFpyenvironment. The tfpyenvironment automatically adds the batch size dimension (in…
tjt
  • 620
  • 2
  • 7
  • 17
1
vote
0 answers

How do I use all cores of my CPU in reinforcement learning with TF Agents?

I work with an RL algorithm. I'm using tensorflow and tf-agents and training a DQN. My problem is that only one core of the CPU is used when calculating the 10 episodes in the environment for data collection. My training function looks like…
1
vote
1 answer

What controls the second dimension of tf observations/ what a qnet accepts in its place?

Short version. I cant find the variable(s) that control either: A) The 2nd dimension of a variable in a trajectory, eg the 3 in Trajectory({'action':
tgmcroc
  • 41
  • 5
1
vote
1 answer

How to store tf-agents' trajectory object in big query from python and retrieve it back as the trajectory object

I wanted to save the trajectories from the tf-agents into a big query table and wanted to retrieve them back as needed into python again. In the python dataframe, the trajectories are saved as trajectory object. But, I am not sure how to save these…
tjt
  • 620
  • 2
  • 7
  • 17
1
vote
1 answer

Error when saving model with tensorflow-agents

I am trying to save a model with tensorflow-agents. First I define the following: collect_policy = tf_agent.collect_policy saver = PolicySaver(collect_policy, batch_size=None) and then save the model like this: saver.save('my_directory/') This…
Enrique
  • 9,920
  • 7
  • 47
  • 59
1
vote
1 answer

TF Agent taking the same action for all test states after training in Reinforcement Learning

I am trying to create a Custom PyEnvironment for making an agent learn the optimum hour to send the notification to the users, based on the rewards received by clicking on the notifications sent in previous 7 days. After the training is complete,…
1
vote
0 answers

How to write a custom policy in tf_agents

I wanted to use the contextual bandit agents (LinearThompson Sampling agent) in the tf_Agents. I am using a custom environment and my rewards are delayed by 3 days. Hence for training, the observations are generated from the saved historical tables…
tjt
  • 620
  • 2
  • 7
  • 17
1
vote
0 answers

Training agent using historical data in TF-agents

I am using contextual bandits algorithm in TF_agents. Is there a way to train the agent using historical data (context, action, reward) in table, instead of using the replay buffer ? The environment provides context and reward. Therefore I cam make…
tjt
  • 620
  • 2
  • 7
  • 17
1
vote
0 answers

TF-Agents _action_spec: how to define the correct shape for discrete action space?

Scenario 1 My custom environment has the following _action_spec: self._action_spec = array_spec.BoundedArraySpec( shape=(highestIndex+1,), dtype=np.int32, minimum=0, maximum=highestIndex, name='action') Therefore my actions are…
Ling
  • 449
  • 6
  • 21
1
2 3