# Decision theory – Markov Decision Process Combines probability and utility theories for generating maximum expected utility (MEU). In decision theory, we have an agent that needs to choose what action to perform for maximizing his utility. An agent is rational if he chooses his actions to maximize his utility. When trying to have a sequence of decisions, we usually use the Markov Decision Process (MDP). The Markov models usually present to the agent a policy for which action to perform in each state. To solve an MDP problem, we need to use an algorithm or a solver.

Types of decision theory modeling:

### MDP

allows us to model the agent’s action when there is uncertainty regarding the agent action outcome, but the environment is still completely observable.

There are two types of MDP:

• Goal state MDP – When reaching the state, the agent stops
• Discount reward MDP – No goal, just trying to have the best utility

### POMDP

Partial observable MDP – When the environment is not completely observable

In the automated pen-testing example, we can use decision theory to solve a more realistic model than the search, a model that contains fewer assumptions than the search and by that has a more realistic solution to the problem. By using a goal-based MDP we can calculate for each action what the probability to succeed is and then calculate MEU each time. But we can still assume that the network is fully observable to the hacker.

To have an even more realistic model we can use POMDP, where the hacker can see only the computers that are connected to the computers he has already hacked. In the graph below you can see that the attacker can observe only computers that are connected to the compromised computer. In each iteration, the attacker needs to choose which computer to attack next without knowing how the entire network looks.