IRL Gym Environments¶

Stickbug¶

For full documentation, see the Stickbug documentation.

Grid World¶

This module contains the GridworldEnv for discrete path planning

class irl_gym.envs.grid_world.GridWorldEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶

Bases: Env

Simple Gridworld where agent seeks to reach goal.

For more information see gym.Env docs

States (dict)

“pose”: [x, y]

Observations

Agent position is fully observable

Actions

0: move south [ 0, -1]

1: move west [-1, 0]

2: move north [ 0, 1]

3: move east [ 1, 0]

Transition Probabilities

\(p \qquad \qquad\) remain in place

\(1-p \quad \quad \:\) transition to desired state

Reward

\(R_{min}, \qquad \qquad \quad d > r_{goal} \)

\(R_{max} - \dfrac{d}{r_{goal}}^2, \quad d \leq r_{goal}\)

where \(d\) is the distance to the goal, \(r_{goal}\) is the reward radius of the goal, and \(R_i\) are the reward extrema.

Input

Parameters:: seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:

dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state, default: {“pose”: [20,20]}
p – (float) probability of remaining in place, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-0.01, 1)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level – (str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)¶

Gets list of actions for a given pose

Parameters:: s – (State) state from which to get actions
Returns:: ((list) actions, (list(ndarray)) subsequent states)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶

render()¶

Renders environment

Has two render modes:

plot uses PyGame visualization
print logs state at Warning level

Visualization

blue circle: agent
green diamond: goal
red diamond: goal + agent
Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})¶

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:

seed – (int) RNG seed, default:, {}
options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶

Gets rewards for \((s,a,s')\) transition

Parameters:

s – (State) Initial state (unused in this environment)
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)¶

Increments enviroment by one timestep

Parameters:: a – (int) action, default: None
Returns:: (tuple) State, reward, is_done, is_truncated, info

Grid Tunnel¶

This module contains the GridTunnelEnv for discrete path planning with a local maxima

class irl_gym.envs.grid_tunnel.GridTunnelEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶

Bases: GridWorldEnv

Simple Gridworld where agent seeks to reach goal with a local minima.

For more information see gym.Env docs

States (dict)

“pose”: [x, y]

Observations

Agent position is fully observable

Actions

0: move south [ 0, -1]

1: move west [-1, 0]

2: move north [ 0, 1]

3: move east [ 1, 0]

Transition Probabilities

\(p \qquad \qquad\) remain in place

\(1-p \quad \quad \:\) transition to desired state

Reward

\(R_{min}, \qquad \qquad \quad \; d > r_{goal} \)

\(\dfrac{R_{max} - \dfrac{d}{r_{goal}}^2}{2}, \quad d \leq r_{trap}\)

\(R_{max} - \dfrac{d}{r_{goal}}^2, \quad \; d \leq r_{goal}\)

where \(d\) is the distance to the goal, \(r_i\) is the reward radius of the goal/trap respectively, and \(R_i\) are the reward extrema.

Input

Parameters:: seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:

dimensions – ([x,y]) size of map, default [35,10]
goal – ([x,y]) position of goal, default [10,5]
state_offset – (int) distance of state from goal in +x direction, default: 15
trap_offset – (int) distance of trap from goal in +x direction, default: 17
p – (float) probability of remaining in place, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-0.01, 1)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶

reset(*, seed: Optional[int] = None, options: dict = {})¶

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:

seed – (int) RNG seed, default:, {}
options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶

Gets rewards for \((s,a,s')\) transition

Parameters:

s – (State) Initial state (unused in this environment)
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None

Returns:

(float) reward

Sailing¶

This module contains the SailingEnv for discrete path planning with dynamic environment

class irl_gym.envs.sailing.SailingEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶

Bases: Env

Sailing in a discrete world where agent seeks to reach goal with changing wind patterns.

This environment is based on that of JonAsbury’s Sailing-v0

For more information see gym.Env docs

States (dict)

“pose”: [x, y, heading]

“wind”: \(m\) x \(n\) np int array (values 0-7)

where \(m\) is the size of the x-dimension and \(n\) the size in y.

Observations

Agent position is fully observable

Actions

-1: turn left 45°

0: move straight

1: turn right 45°

Transition Probabilities

agent moves in desired direction determininstically

\(p\) probability of wind changing at each cell

Reward

\(R = \)

\(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary

\(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),

\(-0.01 - ||h - w||_2 - ||m - g||_2 + \)

\(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary

\((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)

where

\(m\) is the movement direction normalized to \(\sqrt{2}\)

\(w\) is the wind direction normalized to \(\sqrt{2}\)

\(g\) is the goal direction normalized to \(\sqrt{2}\)

\(d\) is the distance to the goal

\(r_{goal}\) is the reward radius of the goal, and

\(R_i\) are the reward extrema.

Input

Parameters:: seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:

dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined
p – (float) probability of wind changing at each cell, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-400, 1100)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)¶

Gets list of actions for a given pose

Parameters:: s – (State) state from which to get actions
Returns:: ((list) actions, (list(ndarray)) subsequent poses without wind)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶

render()¶

Renders environment

Has two render modes:

plot uses PyGame visualization
print logs pose at Warning level

Visualization

blue triangle: agent
green diamond: goal
red diamond: goal + agent
orange triangle: wind direction
Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})¶

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:

seed – (int) RNG seed, default: {}
options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶

Gets rewards for \((s,a,s')\) transition

Parameters:

s – (State) Initial state
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)¶

Increments enviroment by one timestep

Parameters:: a – (int) action, default: None
Returns:: (tuple) State, reward, is_done, is_truncated, info

Sailing Broken Rudder¶

This module contains the SailingEnv for discrete path planning with dynamic environment

class irl_gym.envs.sailing_broken_rudder.SailingBREnv(*, seed: Optional[int] = None, params: Optional[dict] = None)¶

Bases: Env

Sailing in a discrete world where agent seeks to reach goal with changing wind patterns.

This environment is based on that of JonAsbury’s Sailing-v0

For more information see gym.Env docs

States (dict)

“pose”: [x, y, heading]

“wind”: \(m\) x \(n\) np int array (values 0-7)

where \(m\) is the size of the x-dimension and \(n\) the size in y.

Observations

Agent position is fully observable

Actions

-1: turn left 45°

0: move straight

1: turn right 45°

Transition Probabilities

agent moves in desired direction determininstically

\(p\) probability of wind changing at each cell

Reward

\(R = \)

\(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary

\(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),

\(-0.01 - ||h - w||_2 - ||m - g||_2 + \)

\(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary

\((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)

where

\(m\) is the movement direction normalized to \(\sqrt{2}\)

\(w\) is the wind direction normalized to \(\sqrt{2}\)

\(g\) is the goal direction normalized to \(\sqrt{2}\)

\(d\) is the distance to the goal

\(r_{goal}\) is the reward radius of the goal, and

\(R_i\) are the reward extrema.

Input

Parameters:: seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:

dimensions – ([x,y]) size of map, default [40,40]
goal – ([x,y]) position of goal, default [10,10]
state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined
p – (float) probability of wind changing at each cell, default: 0.1
r_radius – (float) Reward radius, default: 5.0
r_range – (tuple) min and max params of reward, default: (-400, 1100)
render – (str) render mode (see metadata for options), default: “none”
cell_size – (int) size of cells for visualization, default: 5
prefix – (string) where to save images, default: “<cwd>/plot”
save_frames – (bool) save images for gif, default: False
log_level –
(str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)¶

Gets list of actions for a given pose

Parameters:: s – (State) state from which to get actions
Returns:: ((list) actions, (list(ndarray)) subsequent poses without wind)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}¶

render()¶

Renders environment

Has two render modes:

plot uses PyGame visualization
print logs pose at Warning level

Visualization

blue triangle: agent
green diamond: goal
red diamond: goal + agent
orange triangle: wind direction
Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})¶

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:

seed – (int) RNG seed, default: {}
options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)¶

Gets rewards for \((s,a,s')\) transition

Parameters:

s – (State) Initial state
a – (int) Action (unused in this environment), default: None
sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)¶

Increments enviroment by one timestep

Parameters:: a – (int) action, default: None
Returns:: (tuple) State, reward, is_done, is_truncated, info

IRL Gym Environments¶

Stickbug¶

Grid World¶

Grid Tunnel¶

Sailing¶

Sailing Broken Rudder¶

Module contents¶

irl-gym

Navigation

Related Topics