To be honest, I would have liked to have seen some stuff on modeling Markov Decision Processes with Dynamic Bayesian Networks, but the book already covers so much material that I'm willing to overlook that detail (she does mention some good pointers on DBN's for MDP's at the end of chapter 23).
There may be gentler introductions into the field out there (Learning Bayesian Networks, for example), but if you want to go beyond the introduction and build a good, solid foundation, PGM is the way to go.
The content is structured within 4 main sections:
- Representation
- Inference
- Learning
- Actions and Decisions
- Likelihood: "Tim is more likely to fly than to walk"
- Conditioning: "If Tim is sick, he can't fly"
- Relevance: "Whether Tim flies depends on whether he is sick"
- Causation: "Being sick caused Tim's inability to fly"
Pearl wrote PRIS at a time when most people were trying to build knowledge systems based on logic, but that approach turned out to be riddled with problems, a big one being the logic systems' inability to handle uncertainty. So, building on the work of those who came before him, he successfully made the case for and launched us into this new era of structured probabilistic reasoning systems.
The reason for adding the qualifier "structured" and not simply "probabilistic reasoning systems," is that naively manipulating a joint probability distribution can be a very computationally expensive proposition. So, we exploit the structure inherent in the problem domain. More specifically, we exploit conditional independences in our joint probability distribution to drastically reduce the number of computations we need to perform (basically, as you head out the door for a short evening walk, you don't need to know what the traffic is like in Japan in order to decide whether or not to take an umbrella with you in New York). It turns out graphs (networks) are a very natural data structure with which to exploit these conditional independences and write computer algorithms for.
PRIS is a very interesting read, even though it may be slightly dated by now. Highlights include several discussions on the merits of probabilistic systems over logic systems, for example, a discussion on Polya, who "argued that the process of discovery, even in as formal a field as mathematics, is guided by nondeductive inference mechanisms, entailing a lot of guesswork" (as opposed to a system that would have all the basic facts and would reach all other conclusions through deductive inference). The discussion is about Polya's so-called "patterns of plausible inference," which was his term for the patterns he identified as representing these nondeductive inference rules that could potentially serve as reasoning mechanisms in some reasoning system. Interestingly, after he had identified a bunch of these patterns and attempted to formalize them into a coherent set of reasoning rules, he [wisely] basically said "screw it," and settled upon the calculus of probabilities as a meta-pattern from which all his patterns could be derived.
So, we have PGM's as an efficient way to represent joint probability distributions, and for a mental image, you can just think of a graph whose nodes are strategically linked to each other so as to maximally exploit conditional independences; the nodes representing the random variables of interest in our domain. Each node has attached to it a conditional probability distribution (CPD), as is the case in a bayesian network, or more generally, a factor (as in the case of a markov network), and we say that the joint probability distribution factors into the CPD's and over the graph. Pictorially, this:
joint(:,:,1,1) =
0.2000 0.0200
0.0900 0.0010
joint(:,:,2,1) =
0.0050 0.0005
0.0360 0.0004
joint(:,:,1,2) =
0 0.1800
0 0.0090
joint(:,:,2,2) =
0.0450 0.0495
0.3240 0.0396
Splits into each of the CPD's (depicted below as tables next to each node), and each CPD goes to its node as specified by the structure of the graph:

The joint distribution can be easily generated with this script, using BNT:
N = 4;
dag = zeros(N,N);
C = 1; S = 2; R = 3; W = 4;
dag(C,[R S]) = 1;
dag(R,W) = 1;
dag(S,W)=1;
false = 1; true = 2;
ns = 2*ones(1,N); % binary nodes
%bnet = mk_bnet(dag, ns);
bnet = mk_bnet(dag, ns, 'names', {'cloudy','S','R','W'}, 'discrete', 1:4);
names = bnet.names;
%C = names{'cloudy'};
bnet.CPD{C} = tabular_CPD(bnet, C, [0.5 0.5]);
bnet.CPD{R} = tabular_CPD(bnet, R, [0.8 0.2 0.2 0.8]);
bnet.CPD{S} = tabular_CPD(bnet, S, [0.5 0.9 0.5 0.1]);
bnet.CPD{W} = tabular_CPD(bnet, W, [1 0.1 0.1 0.01 0 0.9 0.9 0.99]);
CPD{C} = reshape([0.5 0.5], 2, 1);
CPD{R} = reshape([0.8 0.2 0.2 0.8], 2, 2);
CPD{S} = reshape([0.5 0.9 0.5 0.1], 2, 2);
CPD{W} = reshape([1 0.1 0.1 0.01 0 0.9 0.9 0.99], 2, 2, 2);
joint = zeros(2,2,2,2);
for c=1:2
for r=1:2
for s=1:2
for w=1:2
joint(c,s,r,w) = CPD{C}(c) * CPD{S}(c,s) * CPD{R}(c,r) * ...
CPD{W}(s,r,w);
end
end
end
end
joint
The 'representation' section of the book goes into all the gory details, not just on bayesian networks, but markov networks as well, and deeper into how to represent the CPD's, template-based representations, etc, etc.
OK, so we got this great way to represent knowledge, now how do we use it? Well, we use it to answer probabilistic queries, and the algorithms for doing so are the topic of the 'inference' section of the book, where we infer/reason with our knowledge base.
Now, it can be a pain to create this models by hand, which would entail:
- Create the graph structure being careful to maximize our ability to exploit conditional independences
- Parameterize the graph with CPD's
Finally, the last section, 'actions and decisions', brings it all together and provides insights into issues one might have to deal with in order to build an intelligent agent based on these principles. Topics discussed include causality, utilities and decisions, and structured decision problems.
0 comments:
Post a Comment