DecPOMDP Usage

Decentralized POMDP

The DecPOMDP struct gives the following objects:

  • ฮณ: discount factor
  • โ„: agents
  • ๐’ฎ: state space
  • ๐’œ: joint action space
  • ๐’ช: joint observation space
  • T: transition function
  • O: joint observation function
  • R: joint reward function

The agents โ„ are the players of the game. The joint action space ๐’œ is the set of all possible ordered pairs of actions amongst all of the agents. The joint observation space ๐’ช is the set of all possible joint observations. The transition function takes in a state s in ๐’ฎ, a joint action a and a new state s'and returns the transition probability of going from s to s' by taking action a. The joint observation function takes in a state, s, a joint action, a, and a joint observation o in ๐’ช and returns a probability of observing o by taking action a from state s. The joint reward function R takes a state and a joint action in ๐’œ and returns a reward value.