IRL Gym Environments¶
Stickbug¶
For full documentation, see the Stickbug documentation.
Grid World¶
This module contains the GridworldEnv for discrete path planning
- class irl_gym.envs.grid_world.GridWorldEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶
Bases:
EnvSimple Gridworld where agent seeks to reach goal.
For more information see gym.Env docs
States (dict)
“pose”: [x, y]
Observations
Agent position is fully observable
Actions
0: move south [ 0, -1]
1: move west [-1, 0]
2: move north [ 0, 1]
3: move east [ 1, 0]
Transition Probabilities
\(p \qquad \qquad\) remain in place
\(1-p \quad \quad \:\) transition to desired state
Reward
\(R_{min}, \qquad \qquad \quad d > r_{goal} \)
\(R_{max} - \dfrac{d}{r_{goal}}^2, \quad d \leq r_{goal}\)
where \(d\) is the distance to the goal, \(r_{goal}\) is the reward radius of the goal, and \(R_i\) are the reward extrema.
Input
- Parameters:
seed – (int) RNG seed, default: None
Remaining parameters are passed as arguments through the
paramsdict. The corresponding keys are as follows:- Parameters:
dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state, default: {“pose”: [20,20]}
p – (float) probability of remaining in place, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-0.01, 1)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level – (str) Level of logging to use. For more info see logging levels, default: “WARNING”
- get_actions(s: dict)¶
Gets list of actions for a given pose
- Parameters:
s – (State) state from which to get actions
- Returns:
((list) actions, (list(ndarray)) subsequent states)
- metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶
- render()¶
Renders environment
Has two render modes:
plot uses PyGame visualization
print logs state at Warning level
Visualization
blue circle: agent
green diamond: goal
red diamond: goal + agent
Grey cells: The darker the shade, the higher the reward
- reset(*, seed: Optional[int] = None, options: dict = {})¶
Resets environment to initial state and sets RNG seed.
Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…
- Parameters:
seed – (int) RNG seed, default:, {}
options – (dict) params for reset, see initialization, default: None
- Returns:
(tuple) State Observation, Info
- reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶
Gets rewards for \((s,a,s')\) transition
- Parameters:
s – (State) Initial state (unused in this environment)
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None
- Returns:
(float) reward
- step(a: int)¶
Increments enviroment by one timestep
- Parameters:
a – (int) action, default: None
- Returns:
(tuple) State, reward, is_done, is_truncated, info
Grid Tunnel¶
This module contains the GridTunnelEnv for discrete path planning with a local maxima
- class irl_gym.envs.grid_tunnel.GridTunnelEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶
Bases:
GridWorldEnvSimple Gridworld where agent seeks to reach goal with a local minima.
For more information see gym.Env docs
States (dict)
“pose”: [x, y]
Observations
Agent position is fully observable
Actions
0: move south [ 0, -1]
1: move west [-1, 0]
2: move north [ 0, 1]
3: move east [ 1, 0]
Transition Probabilities
\(p \qquad \qquad\) remain in place
\(1-p \quad \quad \:\) transition to desired state
Reward
\(R_{min}, \qquad \qquad \quad \; d > r_{goal} \)
\(\dfrac{R_{max} - \dfrac{d}{r_{goal}}^2}{2}, \quad d \leq r_{trap}\)
\(R_{max} - \dfrac{d}{r_{goal}}^2, \quad \; d \leq r_{goal}\)
where \(d\) is the distance to the goal, \(r_i\) is the reward radius of the goal/trap respectively, and \(R_i\) are the reward extrema.
Input
- Parameters:
seed – (int) RNG seed, default: None
Remaining parameters are passed as arguments through the
paramsdict. The corresponding keys are as follows:- Parameters:
dimensions – ([x,y]) size of map, default [35,10]
goal – ([x,y]) position of goal, default [10,5]
state_offset – (int) distance of state from goal in +x direction, default: 15
trap_offset – (int) distance of trap from goal in +x direction, default: 17
p – (float) probability of remaining in place, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-0.01, 1)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”
- metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶
- reset(*, seed: Optional[int] = None, options: dict = {})¶
Resets environment to initial state and sets RNG seed.
Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…
- Parameters:
seed – (int) RNG seed, default:, {}
options – (dict) params for reset, see initialization, default: None
- Returns:
(tuple) State Observation, Info
- reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶
Gets rewards for \((s,a,s')\) transition
- Parameters:
s – (State) Initial state (unused in this environment)
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None
- Returns:
(float) reward
Sailing¶
This module contains the SailingEnv for discrete path planning with dynamic environment
- class irl_gym.envs.sailing.SailingEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶
Bases:
EnvSailing in a discrete world where agent seeks to reach goal with changing wind patterns.
This environment is based on that of JonAsbury’s Sailing-v0
For more information see gym.Env docs
States (dict)
“pose”: [x, y, heading]
“wind”: \(m\) x \(n\) np int array (values 0-7)
where \(m\) is the size of the x-dimension and \(n\) the size in y.
Observations
Agent position is fully observable
Actions
-1: turn left 45°
0: move straight
1: turn right 45°
Transition Probabilities
agent moves in desired direction determininstically
\(p\) probability of wind changing at each cell
Reward
\(R = \)
\(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary
\(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),
\(-0.01 - ||h - w||_2 - ||m - g||_2 + \)
\(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary
\((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)
where
\(m\) is the movement direction normalized to \(\sqrt{2}\)
\(w\) is the wind direction normalized to \(\sqrt{2}\)
\(g\) is the goal direction normalized to \(\sqrt{2}\)
\(d\) is the distance to the goal
\(r_{goal}\) is the reward radius of the goal, and
\(R_i\) are the reward extrema.
Input
- Parameters:
seed – (int) RNG seed, default: None
Remaining parameters are passed as arguments through the
paramsdict. The corresponding keys are as follows:- Parameters:
dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined
p – (float) probability of wind changing at each cell, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-400, 1100)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”
- get_actions(s: dict)¶
Gets list of actions for a given pose
- Parameters:
s – (State) state from which to get actions
- Returns:
((list) actions, (list(ndarray)) subsequent poses without wind)
- metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶
- render()¶
Renders environment
Has two render modes:
plot uses PyGame visualization
print logs pose at Warning level
Visualization
blue triangle: agent
green diamond: goal
red diamond: goal + agent
orange triangle: wind direction
Grey cells: The darker the shade, the higher the reward
- reset(*, seed: Optional[int] = None, options: dict = {})¶
Resets environment to initial state and sets RNG seed.
Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…
- Parameters:
seed – (int) RNG seed, default: {}
options – (dict) params for reset, see initialization, default: None
- Returns:
(tuple) State Observation, Info
- reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶
Gets rewards for \((s,a,s')\) transition
- Parameters:
s – (State) Initial state
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None
- Returns:
(float) reward
- step(a: int)¶
Increments enviroment by one timestep
- Parameters:
a – (int) action, default: None
- Returns:
(tuple) State, reward, is_done, is_truncated, info
Sailing Broken Rudder¶
This module contains the SailingEnv for discrete path planning with dynamic environment
- class irl_gym.envs.sailing_broken_rudder.SailingBREnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶
Bases:
EnvSailing in a discrete world where agent seeks to reach goal with changing wind patterns.
This environment is based on that of JonAsbury’s Sailing-v0
For more information see gym.Env docs
States (dict)
“pose”: [x, y, heading]
“wind”: \(m\) x \(n\) np int array (values 0-7)
where \(m\) is the size of the x-dimension and \(n\) the size in y.
Observations
Agent position is fully observable
Actions
-1: turn left 45°
0: move straight
1: turn right 45°
Transition Probabilities
agent moves in desired direction determininstically
\(p\) probability of wind changing at each cell
Reward
\(R = \)
\(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary
\(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),
\(-0.01 - ||h - w||_2 - ||m - g||_2 + \)
\(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary
\((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)
where
\(m\) is the movement direction normalized to \(\sqrt{2}\)
\(w\) is the wind direction normalized to \(\sqrt{2}\)
\(g\) is the goal direction normalized to \(\sqrt{2}\)
\(d\) is the distance to the goal
\(r_{goal}\) is the reward radius of the goal, and
\(R_i\) are the reward extrema.
Input
- Parameters:
seed – (int) RNG seed, default: None
Remaining parameters are passed as arguments through the
paramsdict. The corresponding keys are as follows:- Parameters:
dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined
p – (float) probability of wind changing at each cell, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-400, 1100)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”
- get_actions(s: dict)¶
Gets list of actions for a given pose
- Parameters:
s – (State) state from which to get actions
- Returns:
((list) actions, (list(ndarray)) subsequent poses without wind)
- metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶
- render()¶
Renders environment
Has two render modes:
plot uses PyGame visualization
print logs pose at Warning level
Visualization
blue triangle: agent
green diamond: goal
red diamond: goal + agent
orange triangle: wind direction
Grey cells: The darker the shade, the higher the reward
- reset(*, seed: Optional[int] = None, options: dict = {})¶
Resets environment to initial state and sets RNG seed.
Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…
- Parameters:
seed – (int) RNG seed, default: {}
options – (dict) params for reset, see initialization, default: None
- Returns:
(tuple) State Observation, Info
- reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶
Gets rewards for \((s,a,s')\) transition
- Parameters:
s – (State) Initial state
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None
- Returns:
(float) reward
- step(a: int)¶
Increments enviroment by one timestep
- Parameters:
a – (int) action, default: None
- Returns:
(tuple) State, reward, is_done, is_truncated, info