2. Huang Example

Let’s go over the Huang Bayesian Belief Network (BBN) [HD99].

2.1. Model

The Huang BBN is stored in a JSON file. For completeness, we show the content of the JSON file.

[1]:
import json

with open('../../_support/bbn/huang-bbn.json', 'r') as fp:
    data = json.load(fp)

data
[1]:
{'d': {'nodes': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'],
  'edges': [['A', 'B'],
   ['A', 'C'],
   ['B', 'D'],
   ['C', 'E'],
   ['C', 'G'],
   ['D', 'F'],
   ['E', 'F'],
   ['E', 'H'],
   ['G', 'H']]},
 'p': {'A': {'columns': ['A', '__p__'], 'data': [['on', 0.5], ['off', 0.5]]},
  'B': {'columns': ['A', 'B', '__p__'],
   'data': [['on', 'on', 0.5],
    ['on', 'off', 0.5],
    ['off', 'on', 0.4],
    ['off', 'off', 0.6]]},
  'C': {'columns': ['A', 'C', '__p__'],
   'data': [['on', 'on', 0.7],
    ['on', 'off', 0.3],
    ['off', 'on', 0.2],
    ['off', 'off', 0.8]]},
  'D': {'columns': ['B', 'D', '__p__'],
   'data': [['on', 'on', 0.9],
    ['on', 'off', 0.1],
    ['off', 'on', 0.5],
    ['off', 'off', 0.5]]},
  'E': {'columns': ['C', 'E', '__p__'],
   'data': [['on', 'on', 0.3],
    ['on', 'off', 0.7],
    ['off', 'on', 0.6],
    ['off', 'off', 0.4]]},
  'F': {'columns': ['D', 'E', 'F', '__p__'],
   'data': [['on', 'on', 'on', 0.01],
    ['on', 'on', 'off', 0.99],
    ['on', 'off', 'on', 0.01],
    ['on', 'off', 'off', 0.99],
    ['off', 'on', 'on', 0.01],
    ['off', 'on', 'off', 0.99],
    ['off', 'off', 'on', 0.99],
    ['off', 'off', 'off', 0.01]]},
  'G': {'columns': ['C', 'G', '__p__'],
   'data': [['on', 'on', 0.8],
    ['on', 'off', 0.2],
    ['off', 'on', 0.1],
    ['off', 'off', 0.9]]},
  'H': {'columns': ['E', 'G', 'H', '__p__'],
   'data': [['on', 'on', 'on', 0.05],
    ['on', 'on', 'off', 0.95],
    ['on', 'off', 'on', 0.95],
    ['on', 'off', 'off', 0.05],
    ['off', 'on', 'on', 0.95],
    ['off', 'on', 'off', 0.05],
    ['off', 'off', 'on', 0.95],
    ['off', 'off', 'off', 0.05]]}}}

From this BBN-specified JSON file, we will create the capable reasoning model.

[2]:
from pybbn.factory import create_reasoning_model

model = create_reasoning_model(data['d'], data['p'])

2.1.1. Graph (DAG)

We can visualize the directed acyclic graph (DAG) of the Huang BBN.

[3]:
from help.viz import *

draw_dag(model)
_images/huang_6_0.png

2.1.2. Parameters

Here are the parameters of the Huang BBN.

[4]:
model.node_potentials['A']
[4]:
A __p__
0 on 0.5
1 off 0.5
[5]:
model.node_potentials['B']
[5]:
A B __p__
0 on on 0.5
1 on off 0.5
2 off on 0.4
3 off off 0.6
[6]:
model.node_potentials['C']
[6]:
A C __p__
0 on on 0.7
1 on off 0.3
2 off on 0.2
3 off off 0.8
[7]:
model.node_potentials['D']
[7]:
B D __p__
0 on on 0.9
1 on off 0.1
2 off on 0.5
3 off off 0.5
[8]:
model.node_potentials['E']
[8]:
C E __p__
0 on on 0.3
1 on off 0.7
2 off on 0.6
3 off off 0.4
[9]:
model.node_potentials['F']
[9]:
D E F __p__
0 on on on 0.01
1 on on off 0.99
2 on off on 0.01
3 on off off 0.99
4 off on on 0.01
5 off on off 0.99
6 off off on 0.99
7 off off off 0.01
[10]:
model.node_potentials['G']
[10]:
C G __p__
0 on on 0.8
1 on off 0.2
2 off on 0.1
3 off off 0.9
[11]:
model.node_potentials['H']
[11]:
E G H __p__
0 on on on 0.05
1 on on off 0.95
2 on off on 0.95
3 on off off 0.05
4 off on on 0.95
5 off on off 0.05
6 off off on 0.95
7 off off off 0.05

2.2. Graphs

2.2.1. Intermediate graphs

It might be interesting to see the intermediate graphs created. Here’s what the graphs below represent.

  • DAG: the Huang DAG associated with the Huang BBN

  • Undirected: the undirected graph of the Huang DAG (directed edges are removed)

  • Moralized: the moralized graph of the Huang DAG

  • Triangulated: the triangulated graph of the Huang DAG

[12]:
draw_intermediate_graphs(model)
_images/huang_18_0.png

2.2.2. Join tree

The join tree is the main inferencing engine of the reasoning model. We can choose to visualize the join tree too.

  • red nodes: these are the cliques/clusters of the join tree

  • black nodes: these are the separation sets of the join tree

[13]:
draw_junction_tree(model)
_images/huang_20_0.png

2.3. Queries

Queries can be classified as follows.

  • Marginal query (univariate probabilities)

  • Associational query (conditional probabilities)

  • Interventional query (causal effect probabilities)

  • Counterfactual query (counterfactual probabilities)

The above categories of queries follow Pearl’s Causal Hierarchy (PCH) or the Causal Ladder [GDH22].

2.3.1. Marginal query

Let’s just follow one variable, \(H\). The marginal probability of \(H\), \(P(H)\), is easy to query.

[14]:
q = model.pquery()
q['H']
[14]:
H __p__
0 off 0.1769
1 on 0.8231

2.3.2. Associational query

In an associational query, we want to compute the conditional probabilities

  • \(P(H|E=\mathrm{on})\), and

  • \(P(H|E=\mathrm{off})\).

[15]:
q = model.pquery(evidences=model.e({'E': 'on'}))
q['H']
[15]:
H __p__
0 off 0.322903
1 on 0.677097
[16]:
q = model.pquery(evidences=model.e({'E': 'off'}))
q['H']
[16]:
H __p__
0 off 0.05
1 on 0.95

2.3.3. Interventional query

The interventional query is estimating the causal effect one one variable on another. In this case, we want to estimate the causal effect of \(E\) on \(H\). However, \(E\) and \(H\) are confounded through \(\{C, G\}\). We need to apply the do-operator on \(E\) to estimate the causal effect of \(E\) on \(H\). (The causal effect of \(E\) on \(H\) is not simply the associational query we performed above!)

\(P(H=h|\mathrm{do}(E=e))\)

We do not have to control for both counfounders \(\{C, G\}\), it is sufficient to just control for \(C\) to block backdoor paths from \(E\) to \(H\). Thus, using the backdoor adjustment formula, we can compute the causal effect as follows.

\(P(H=h|\mathrm{do}(E=e)) = \sum_C P(H=h|E=e, C)P(C)\)

Let’s compute two causal effects.

  • \(P(H=\mathrm{on}|\mathrm{do}(E=\mathrm{on})) = \sum_C P(H=\mathrm{on}|E=\mathrm{on}, C)P(C)\)

  • \(P(H=\mathrm{on}|\mathrm{do}(E=\mathrm{off})) = \sum_C P(H=\mathrm{on}|E=\mathrm{off}, C)P(C)\)

[17]:
p_on = model.iquery(Y=['H'], y=['on'], X=['E'], x=['on'])
p_on
[17]:
H    0.5765
dtype: float64
[18]:
p_off = model.iquery(Y=['H'], y=['on'], X=['E'], x=['off'])
p_off
[18]:
H    0.95
dtype: float64

The average causal effect (ACE) is defined as follows.

\(\mathrm{ACE} = P(H=\mathrm{on}|\mathrm{do}(E=\mathrm{on})) - P(H=\mathrm{on}|\mathrm{do}(E=\mathrm{off}))\)

[19]:
p_on['H'] - p_off['H']
[19]:
-0.37349999999999994

2.3.4. Counterfactual query

In this counterfactual query, we have the following.

  • query: \(H\)

  • factual: \(E=\mathrm{on}, G=\mathrm{on}, H=\mathrm{on}\)

  • counterfactual: \(E=\mathrm{off}\)

The query is against \(H\); it is the variable we are interested in. The factual is what actually happened. The counterfactual is the hypothetical. Let’s try to word this as a counterfactual statement (not so easy to do).

  • Given that \(E\) was on, \(G\) was on, \(H\) was on, what is the probability of \(H\) had we set \(E\) to off?

[20]:
Y = 'H'
e = {
    'E': 'on',
    'G': 'on',
    'H': 'on'
}
h = {
    'E': 'off'
}
model.cquery(Y, e, h)
[20]:
H __p__
0 off 0.043749
1 on 0.956251

Let’s try another counterfactual query. We have the following.

  • query: \(H\)

  • factual: \(E=\mathrm{off}, G=\mathrm{off}, H=\mathrm{off}\)

  • counterfactual: \(E=\mathrm{on}\)

As a counterfactual statement, we might naturally say the following (?).

  • Given that \(E\) was off, \(G\) was off, \(H\) was off, what is the probability of \(H\) had we set \(E\) to on?

[21]:
Y = 'H'
e = {
    'E': 'off',
    'G': 'off',
    'H': 'off'
}
h = {
    'E': 'on'
}
model.cquery(Y, e, h)
[21]:
H __p__
0 off 0.439756
1 on 0.560244

Here’s the last counterfactual query. We have the following.

  • query: \(H\)

  • factual: \(E=\mathrm{on}, G=\mathrm{on}, H=\mathrm{on}\)

  • counterfactual: \(E=\mathrm{off}, G=\mathrm{off}\)

As a counterfactual statement, we might naturally say the following (?).

  • Given that \(E\) was on, \(G\) was on, \(H\) was on, what is the probability of \(H\) had we set \(E\) and \(G\) to off?

[22]:
Y = 'H'
e = {
    'E': 'on',
    'G': 'on',
    'H': 'on'
}
h = {
    'E': 'off',
    'G': 'off'
}
model.cquery(Y, e, h)
[22]:
H __p__
0 off 0.04526
1 on 0.95474

2.3.5. Graphical query

Graphical queries revolve around the DAG. Typical graphical queries are about d-separation and discovering counfounders and mediators.

From the DAG, we ask if \(F\) and \(H\) are d-separated, \(I(F, H)\). The answer is false, since there are backdoor paths between \(F\) and \(H\). Here are the backdoor paths between \(F\) and \(H\).

  • \(F, D, B, A, C, E, H\)

  • \(F, D, B, A, C, G, H\)

  • \(F, E, C, G, H\)

  • \(F, E, H\)

[23]:
model.is_d_separated('F', 'H')
[23]:
False

Perhaps we can block all backdoor paths between \(F\) and \(H\) with \(E\), \(I(F,H|E)\)? The answer is false.

[24]:
model.is_d_separated('F', 'H', {'E'})
[24]:
False

Can we block all backdoor paths between \(F\) and \(H\) with \(E\) and \(C\), \(I(F,H|E, C)\)? The answer is true.

[25]:
model.is_d_separated('F', 'H', {'E', 'C'})
[25]:
True

We can query the graph to get us the “minimal confounders” that will block all backdoor paths. There are multiple minimal sets, but with the particular approach we use, the answer is \(\{D, E\}\). The answer to \(I(F,H|D,E)\) is true.

[26]:
model.get_minimal_confounders('F', 'H')
[26]:
['D', 'E']

We can use the model itself to 1) get the minimal confounders and then 2) test for d-separation as follows.

[27]:
model.is_d_separated('F', 'H', model.get_minimal_confounders('F', 'H'))
[27]:
True

If we want a list of all counfounders that can block all backdoors between \(F\) and \(H\), then we can also get that from the DAG.

[28]:
model.get_all_confounders('F', 'H')
[28]:
{'A', 'B', 'C', 'D', 'E', 'G'}

2.4. Save

Finally, we can save the reasoning model that we built from the Huang BBN. Note that we are persisting the reasoning model and not the BBN that we depersisted to begin with.

[29]:
from pybbn.serde import model_to_dict

with open('../../_support/bbn/huang-reasoning.json', 'w') as fp:
    json.dump(model_to_dict(model), fp, indent=1)