IRL Gym Environments

Stickbug

For full documentation, see the Stickbug documentation.

Grid World

This module contains the GridworldEnv for discrete path planning

class irl_gym.envs.grid_world.GridWorldEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)

Bases: Env

Simple Gridworld where agent seeks to reach goal.

For more information see gym.Env docs

States (dict)

  • “pose”: [x, y]

Observations

Agent position is fully observable

Actions

  • 0: move south [ 0, -1]

  • 1: move west [-1, 0]

  • 2: move north [ 0, 1]

  • 3: move east [ 1, 0]

Transition Probabilities

  • \(p \qquad \qquad\) remain in place

  • \(1-p \quad \quad \:\) transition to desired state

Reward

  • \(R_{min}, \qquad \qquad \quad d > r_{goal} \)

  • \(R_{max} - \dfrac{d}{r_{goal}}^2, \quad d \leq r_{goal}\)

where \(d\) is the distance to the goal, \(r_{goal}\) is the reward radius of the goal, and \(R_i\) are the reward extrema.

Input

Parameters:

seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:
  • dimensions – ([x,y]) size of map, default [40,40]

  • goal – ([x,y]) position of goal, default [10,10]

  • state – (State) Initial state, default: {“pose”: [20,20]}

  • p – (float) probability of remaining in place, default: 0.1

  • r_radius – (float) Reward radius, default: 5.0

  • r_range – (tuple) min and max params of reward, default: (-0.01, 1)

  • render – (str) render mode (see metadata for options), default: “none”

  • cell_size – (int) size of cells for visualization, default: 5

  • prefix – (string) where to save images, default: “<cwd>/plot”

  • save_frames – (bool) save images for gif, default: False

  • log_level – (str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)

Gets list of actions for a given pose

Parameters:

s – (State) state from which to get actions

Returns:

((list) actions, (list(ndarray)) subsequent states)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}
render()

Renders environment

Has two render modes:

  • plot uses PyGame visualization

  • print logs state at Warning level

Visualization

  • blue circle: agent

  • green diamond: goal

  • red diamond: goal + agent

  • Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:
  • seed – (int) RNG seed, default:, {}

  • options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)

Gets rewards for \((s,a,s')\) transition

Parameters:
  • s – (State) Initial state (unused in this environment)

  • a – (int) Action (unused in this environment), default: None

  • sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)

Increments enviroment by one timestep

Parameters:

a – (int) action, default: None

Returns:

(tuple) State, reward, is_done, is_truncated, info

Grid Tunnel

This module contains the GridTunnelEnv for discrete path planning with a local maxima

class irl_gym.envs.grid_tunnel.GridTunnelEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)

Bases: GridWorldEnv

Simple Gridworld where agent seeks to reach goal with a local minima.

For more information see gym.Env docs

States (dict)

  • “pose”: [x, y]

Observations

Agent position is fully observable

Actions

  • 0: move south [ 0, -1]

  • 1: move west [-1, 0]

  • 2: move north [ 0, 1]

  • 3: move east [ 1, 0]

Transition Probabilities

  • \(p \qquad \qquad\) remain in place

  • \(1-p \quad \quad \:\) transition to desired state

Reward

  • \(R_{min}, \qquad \qquad \quad \; d > r_{goal} \)

  • \(\dfrac{R_{max} - \dfrac{d}{r_{goal}}^2}{2}, \quad d \leq r_{trap}\)

  • \(R_{max} - \dfrac{d}{r_{goal}}^2, \quad \; d \leq r_{goal}\)

where \(d\) is the distance to the goal, \(r_i\) is the reward radius of the goal/trap respectively, and \(R_i\) are the reward extrema.

Input

Parameters:

seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:
  • dimensions – ([x,y]) size of map, default [35,10]

  • goal – ([x,y]) position of goal, default [10,5]

  • state_offset – (int) distance of state from goal in +x direction, default: 15

  • trap_offset – (int) distance of trap from goal in +x direction, default: 17

  • p – (float) probability of remaining in place, default: 0.1

  • r_radius – (float) Reward radius, default: 5.0

  • r_range – (tuple) min and max params of reward, default: (-0.01, 1)

  • render – (str) render mode (see metadata for options), default: “none”

  • cell_size – (int) size of cells for visualization, default: 5

  • prefix – (string) where to save images, default: “<cwd>/plot”

  • save_frames – (bool) save images for gif, default: False

  • log_level

    (str) Level of logging to use. For more info see logging levels, default: “WARNING”

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}
reset(*, seed: Optional[int] = None, options: dict = {})

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:
  • seed – (int) RNG seed, default:, {}

  • options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)

Gets rewards for \((s,a,s')\) transition

Parameters:
  • s – (State) Initial state (unused in this environment)

  • a – (int) Action (unused in this environment), default: None

  • sp – (State) resultant state, default: None

Returns:

(float) reward

Sailing

This module contains the SailingEnv for discrete path planning with dynamic environment

class irl_gym.envs.sailing.SailingEnv(*, seed: Optional[int] = None, params: Optional[dict] = None)

Bases: Env

Sailing in a discrete world where agent seeks to reach goal with changing wind patterns.

This environment is based on that of JonAsbury’s Sailing-v0

For more information see gym.Env docs

States (dict)

  • “pose”: [x, y, heading]

  • “wind”: \(m\) x \(n\) np int array (values 0-7)

where \(m\) is the size of the x-dimension and \(n\) the size in y.

Observations

Agent position is fully observable

Actions

  • -1: turn left 45°

  • 0: move straight

  • 1: turn right 45°

Transition Probabilities

  • agent moves in desired direction determininstically

  • \(p\) probability of wind changing at each cell

Reward

\(R = \)

  • \(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary

  • \(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),

  • \(-0.01 - ||h - w||_2 - ||m - g||_2 + \)

    • \(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary

    • \((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)

where

  • \(m\) is the movement direction normalized to \(\sqrt{2}\)

  • \(w\) is the wind direction normalized to \(\sqrt{2}\)

  • \(g\) is the goal direction normalized to \(\sqrt{2}\)

  • \(d\) is the distance to the goal

  • \(r_{goal}\) is the reward radius of the goal, and

  • \(R_i\) are the reward extrema.

Input

Parameters:

seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:
  • dimensions – ([x,y]) size of map, default [40,40]

  • goal – ([x,y]) position of goal, default [10,10]

  • state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined

  • p – (float) probability of wind changing at each cell, default: 0.1

  • r_radius – (float) Reward radius, default: 5.0

  • r_range – (tuple) min and max params of reward, default: (-400, 1100)

  • render – (str) render mode (see metadata for options), default: “none”

  • cell_size – (int) size of cells for visualization, default: 5

  • prefix – (string) where to save images, default: “<cwd>/plot”

  • save_frames – (bool) save images for gif, default: False

  • log_level

    (str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)

Gets list of actions for a given pose

Parameters:

s – (State) state from which to get actions

Returns:

((list) actions, (list(ndarray)) subsequent poses without wind)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}
render()

Renders environment

Has two render modes:

  • plot uses PyGame visualization

  • print logs pose at Warning level

Visualization

  • blue triangle: agent

  • green diamond: goal

  • red diamond: goal + agent

  • orange triangle: wind direction

  • Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:
  • seed – (int) RNG seed, default: {}

  • options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)

Gets rewards for \((s,a,s')\) transition

Parameters:
  • s – (State) Initial state

  • a – (int) Action (unused in this environment), default: None

  • sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)

Increments enviroment by one timestep

Parameters:

a – (int) action, default: None

Returns:

(tuple) State, reward, is_done, is_truncated, info

Sailing Broken Rudder

This module contains the SailingEnv for discrete path planning with dynamic environment

class irl_gym.envs.sailing_broken_rudder.SailingBREnv(*, seed: Optional[int] = None, params: Optional[dict] = None)

Bases: Env

Sailing in a discrete world where agent seeks to reach goal with changing wind patterns.

This environment is based on that of JonAsbury’s Sailing-v0

For more information see gym.Env docs

States (dict)

  • “pose”: [x, y, heading]

  • “wind”: \(m\) x \(n\) np int array (values 0-7)

where \(m\) is the size of the x-dimension and \(n\) the size in y.

Observations

Agent position is fully observable

Actions

  • -1: turn left 45°

  • 0: move straight

  • 1: turn right 45°

Transition Probabilities

  • agent moves in desired direction determininstically

  • \(p\) probability of wind changing at each cell

Reward

\(R = \)

  • \(R_{min}, \qquad \qquad \qquad \qquad \quad\) for hitting boundary

  • \(R_{max}, \qquad \qquad \qquad \qquad \quad d = 0\),

  • \(-0.01 - ||h - w||_2 - ||m - g||_2 + \)

    • \(-0.1, \qquad \qquad \qquad \qquad \quad\) when \(\leq 5\) cells from boundary

    • \((R_{max}-100)(1 - \dfrac{d}{r_{goal}}^2), \; d \leq r_{goal}\)

where

  • \(m\) is the movement direction normalized to \(\sqrt{2}\)

  • \(w\) is the wind direction normalized to \(\sqrt{2}\)

  • \(g\) is the goal direction normalized to \(\sqrt{2}\)

  • \(d\) is the distance to the goal

  • \(r_{goal}\) is the reward radius of the goal, and

  • \(R_i\) are the reward extrema.

Input

Parameters:

seed – (int) RNG seed, default: None

Remaining parameters are passed as arguments through the params dict. The corresponding keys are as follows:

Parameters:
  • dimensions – ([x,y]) size of map, default [40,40]

  • goal – ([x,y]) position of goal, default [10,10]

  • state – (State) Initial state (wind not required), default: {“pose”: [20,20]}, wind undefined

  • p – (float) probability of wind changing at each cell, default: 0.1

  • r_radius – (float) Reward radius, default: 5.0

  • r_range – (tuple) min and max params of reward, default: (-400, 1100)

  • render – (str) render mode (see metadata for options), default: “none”

  • cell_size – (int) size of cells for visualization, default: 5

  • prefix – (string) where to save images, default: “<cwd>/plot”

  • save_frames – (bool) save images for gif, default: False

  • log_level

    (str) Level of logging to use. For more info see logging levels, default: “WARNING”

get_actions(s: dict)

Gets list of actions for a given pose

Parameters:

s – (State) state from which to get actions

Returns:

((list) actions, (list(ndarray)) subsequent poses without wind)

metadata: dict[str, Any] = {'render_fps': 5, 'render_modes': ['plot', 'print', 'none']}
render()

Renders environment

Has two render modes:

  • plot uses PyGame visualization

  • print logs pose at Warning level

Visualization

  • blue triangle: agent

  • green diamond: goal

  • red diamond: goal + agent

  • orange triangle: wind direction

  • Grey cells: The darker the shade, the higher the reward

reset(*, seed: Optional[int] = None, options: dict = {})

Resets environment to initial state and sets RNG seed.

Deviates from Gym in that it is assumed you can reset RNG seed at will because why should it matter…

Parameters:
  • seed – (int) RNG seed, default: {}

  • options – (dict) params for reset, see initialization, default: None

Returns:

(tuple) State Observation, Info

reward(s: dict, a: Optional[int] = None, sp: Optional[dict] = None)

Gets rewards for \((s,a,s')\) transition

Parameters:
  • s – (State) Initial state

  • a – (int) Action (unused in this environment), default: None

  • sp – (State) resultant state, default: None

Returns:

(float) reward

step(a: int)

Increments enviroment by one timestep

Parameters:

a – (int) action, default: None

Returns:

(tuple) State, reward, is_done, is_truncated, info

Module contents