Computational Learning and Probabilistic Reasoning: Contents

CONTENTS
Preface, xiii
List of Figures, xvii
List of Tables, xxi
List of Contributors, xxiii
I Generalisation Principles and Learning 1
1 Structure of Statistical Learning Theory (V. Vapnik), 3
1.1 Function Estimation Model, 32 Stochastic Complexity — an Introduction (J. Rissanen), 33
1.2 Problem of Risk Minimization, 4
1.3 Three Main Learning Problems, 4
1.3.1 The Problem of Pattern Recognition, 4
1.3.2 The Problem of Regression Estimation, 5
1.3.3 The Problem of Density Estimation, 5
1.3.4 The General Setting of the Learning Problem, 5
1.4 Empirical Risk Minimization Induction Principle, 6
1.5 Four Parts of Learning Theory, 6
1.6 Theory of Consistency of the Learning Processes, 7
1.6.1 The Key Assertion of the Learning Theory, 7
1.6.2 The Necessary and Sufficient Conditions for Uniform Convergence, 8
1.6.3 Three Milestones of Learning Theory, 9
1.7 Bounds for the Rate or Convergence of the Learning Processes, 11
1.7.1 The Structure of the Growth-function, 12
1.7.2 VC-dimension of the Set of Functions, 12
1.7.3 Distribution Independent Bounds for the Rate of Convergence of the Learning Processes, 13
1.7.4 Problem of Constructing Rigorous (distribution dependent) Bounds, 15
1.8 Theory for Controlling Generalization Ability of Learning Processes, 16
1.8.1 Structural Risk Minimization Induction Principle, 16
1.8.2 Examples of the Structure for Neural Nets, 18
1.8.3 The SRM Principle in the Problem of Polynomial Regression, 19
1.8.4 Problem of Local Function Estimation, 19
1.9 Theory of Constructing Learning Algorithms for the Pattern Recognition Problem, 20
1.9.1 Sigmoid Approximation of Indicator Functions and Neural Nets, 21
1.9.2 Method of Optimal Separating Hyperplane and Support Vectors Networks, 22
1.9.3 Why Can Neural Networks and Support Vectors Networks generalize? 26
References, 30
2.1 General, 333 MML Inference of Predictive Trees, Graphs and Nets (C.S. Wallace), 43
2.2 Stochastic Complexity, 36
2.3 Predictive Coding, 39
References, 41
3.1 A Critique of Computational Learning Theory, 434 Learning and Reasoning as Information Compression by Multiple Alignment, Unification and Search (J.G. Wolff), 67
3.1.1 Computational Learning Theory, 44
3.1.2 Gold's Model, 44
3.1.3 Gold's Theory, 44
3.1.4 Valiant's Model, 45
3.1.5 The Deductive Fallacy, 46
3.2 Learning from Data, 47
3.2.1 Learning is NOT Deduction, 47
3.2.2 Learning as a Competition, 47
3.2.3 Explanations, 48
3.3 The Message Length Criterion, 49
3.3.1 Parameter Coding, 50
3.3.2 Bayesian Parallel, 50
3.4 Theoretical Results, 51
3.4.1 Sufficiency, 51
3.4.2 Avoidance of Overfitting, 51
3.4.3 Consistency, 52
3.4.4 Oracular Imitation, 52
3.4.5 Invariance, 53
3.4.6 Kullback-Liebler Distance, 53
3.5 The Kolmogarov-Chaitin Formulation, 54
3.5.1 Theoretical Results from Kolmogorov-Chaitin Form, 54
3.6 Prediction vs. Induction, 55
3.6.1 Prediction in the Kolmogarov-Chaitin Formulation, 55
3.7 Applications, 56
3.7.1 Learning the Class Structure of Populations ("unsupervised learning"), 56
3.7.2 Learning the Shape Patterns of Megalithic Stone Circles, 56
3.7.3 Learning Simple Grammars, 57
3.7.4 Learning Rules in the Form of Decision Trees, 57
3.7.5 Learning Decision Graphs, 57
3.7.6 Inferring Relationships among DNA Strings, 58
3.7.7 Learning Existence and Effects of Hidden Variables in Linear Models, 58
3.7.8 Inference of Prolog Programs, 58
3.8 Coding Techniques, 58
3.8.1 Encoding Real-Valued Data, 59
3.8.2 Coding Real Parameters, 61
3.8.3 Coding Discrete Parameters, 62
3.8.4 Prior and Progressive Codes, 63
References, 64
4.1 Introduction, 675 Probabilistic Association and Denotation in Machine Learning of Natural Language (P. Suppes and L. Liang), 87
4.2 Multiple Alignment Problems, 68
4.2.1 What is an "Optimal" Alignment? 68
4.2.2 Searching for Optimal Alignments, 69
4.3 Unifications and Tags, 69
4.4 Learning as Multiple Alignment with Unification and Search, 70
4.4.1 Learning Segmental Structure, 71
4.4.2 Learning Disjunctive Classes, 72
4.5 Reasoning as Multiple Alignment with Unification and Search, 74
4.5.1 Recognition and Reasoning, 74
4.5.2 Modus Ponens, 74
4.5.3 Modus Tollens, 76
4.5.4 Abduction, 77
4.5.5 Chains of Reasoning, 78
4.5.6 Inductive Reasoning, 78
4.5.7 Analogical Reasoning, 81
4.6 Conclusion, 81
References, 83
5.1 Our Approach to Machine Learning, 88II Causation and Model Selection, 101
5.2 Problem of Denotation, 89
5.3 Background Cognitive and Perceptual Assumptions, 90
5.4 The General Axioms of Association and Denotation, 93
5.5 Denotational Learning Models, 94
5.6 Some Empirical Results, 99
5.7 Conclusion, 100
References, 100
6 Causation, Action, and Counterfactuals (J. Pearl), 103
6.1 Introduction, 1037 Another Semantics for Pearl's Action Calculus (V.G. Vovk), 125
6.2 Action as a Local Surgery, 106
6.2.1 Mechanisms and Surgeries, 106
6.2.2 Laws vs. Facts, 107
6.2.3 Mechanisms and Causal Relationships, 108
6.2.4 Causal Ordering, 108
6.2.5 Imaging vs. Conditioning, 109
6.3 Formal Underpinning, 111
6.3.1 Causal Theories, Actions, Causal Effect, and Identifiability, 111
6.3.2 Action Calculus, 115
6.3.3 Historical Background, 118
6.4 Counterfactuals, 118
References, 120
7.1 Introduction, 1258 Efficient Estimation and Model Selection in Large Graphical Models (D. Wedelin), 145
7.2 Main Result, 128
7.3 Formal Action Calculus, 132
7.4 Proof of Theorem 7.1, 137
7.5 Discussion, 141
References, 143
8.1 Introduction, 1459 T-Normal Distribution on the Bayesian Belief Networks (Yu.N. Blagoveschensky), 161
8.2 An Orthogonal Interaction ModeL, 146
8.2.1 Independence Properties in the Orthogonal Model, 148
8.3 Efficient Parameter Estimation in a Given Undirected Model, 149
8.3.1 The Reverse Algorithm, 150
8.3.2 An Improved Reverse Algorithm, 151
8.4 Efficient Model Selection, 152
8.5 Directed Models, 154
8.6 Computational Results, 155
8.7 Conclusion, 158
References, 158
9.1 Introduction, 161III Bayesian Belief Networks and Hybrid Systems, 167
9.2 Necessary Definitions and Denotations, 162
9.3 Results and some New Problems, 163
9.4 Conclusions, 166
References, 166
10 Bayesian Belief Networks with an Application in Specific Case Analysis (C.G.G. Aitken et al.), 169
10.1 Introduction, 16911 Baysian Belief Networks and Patient Treatment (L.D. Meshalkin and E.K. Tsybulkin), 185
10.2 The CATCHEM Project, 170
10.2.1 Offender Variables 170
10.2.2 Victim Variables, 171
10.2.3 Combination of Offender and Victim Variables, 171
10.3 PRESS — a Bayesian Belief Network System, 171
10.3.1 Network Construction, 173
10.3.2 System Testing and Performance, 173
10.4 Examples of Networks, 174
10.4.1 Seven Node Network, 174
10.4.2 Ten Node Network, 178
10.4.3 Networks with a Large Number of Nodes, 180
10.5 Comments, 180
10.6 Discussion, 182
10.6.1 Measures of Performance, 182
10.6.2 Models in Real Time, 182
10.6.3 Interpretation of Probabilities, 182
10.6.4 Incorporation of Detective Expertise, 182
10.7 Conclusion, 183
References, 184
11.1 Introduction, 18512 A Higher Order Bayesian Neural Network for Classification and Diagnosis (A. Hoist and A. Lansner), 199
11.2 Scale of Severity of a Patient's State and Mathematical Tools, 186
11.3 Mathematical Tools Adjustment, 188
11.4 Notation, Time Structure and Leading Graph, 189
11.5 The Choice Between Different Trajectories of Disease Development as a New Problem, 191
11.6 Example, 191
11.7 Conclusion, 195
References, 196
12.1 The Classification Task, 19913 Genetic Algorithms Applied to Bayesian Networks (P. Larranaga et al.), 211
12.2 The Bayesian Confidence Propagation Neural Network, 202
12.3 The Diagnosis Application, 205
12.4 Discussion and Conclusions, 207
References, 208
13.1 Introduction, 211IV Decision-Making, Optimization and Classification, 235
13.2 Genetic Algorithms, 212
13.3 Genetic Operators in Relation to the TSP, 213
13.3.1 Crossover Operators, 213
13.3.2 Mutation Operators, 216
13.4 Decomposition of Bayesian Networks, 216
13.4.1 Introduction, 216
13.4.2 Description of the Experiments, 217
13.4.3 Results and Conclusions, 217
13.5 Structure Learning, 219
13.5.1 Introduction and Related Work, 219
13.5.2 Searching in the Space of Network Structures, 219
13.5.3 Searching for the Best Ordering, 223
13.5.4 Remarks and Conclusions, 224
13.6 Fusion, 225
13.6.1 Introduction, 225
13.6.2 Proposed Approach, 225
13.6.3 An Example, 225
13.6.4 Remarks and Conclusions, 227
13.7 Conclusions, 227
References, 227
14 Rationality, Conditional Independence and Statistical Models of Competition (J.Q. Smith and C.T.J. Allard), 237
14.1 Introduction, 23715 Axioms for Dynamic Programming (P.P. Shenoy), 259
14.2 Making Deductions in Statistical and Decision Problems, 240
14.3 Common-Knowledge and Rationality, 244
14.4 Equilibral Rationality, 248
14.5 Statistical Models Structured from Evoking Rationality, 251
14.6 Conclusions, 256
References, 256
15.1 Introduction, 25916 Mixture-Model Cluster Analysis Using the Projection Pursuit Method (S. Aivazian), 277
15.2 Valuation Networks and Optimization, 261
15.3 The Axioms, 264
15.4 A Fusion Algorithm, 265
15.5 Mitten's Axioms for Dynamic Programming, 270
15.6 Conclusion, 271
15.7 Proofs, 271
References, 273
16.1 Mixture-Model and Cluster Analysis Problem, 27717 A Parallel kn-Nearest Neighbour Classifier for Estimation of Non-linear Decision Regions (A. Kovalenko), 287
16.2 Estimating the Unknown Parameters of the Mixture-Model, 278
16.3 Problem of Estimating the Unknown Number of Classes, 279
16.4 Detecting the Number of Clusters by Means of Projection Pursuit, 281
16.4.1 Projection Pursuit Technique, 281
16.4.2 Discriminant Subspace, 282
16.4.3 Projection Index Suitable for Detecting the Number of Clusters, 283
16.5 Mixture-Model Cluster Analysis and Bayesian Belief Networks, 284
References, 285
17.1 Introduction, 28718 Extreme Values of Punctionals Characterizing Stability of Statistical Decisions (A.V. Nagaev), 295
17.2 Parallel Classifier, 288
17.3 Signal Recognition Against a Noise Background: A Numerical Example, 291
References, 293
18.1 Introduction, 295Index, 309
18.2 Stability within a Belief Network, 296
18.2.1 Properties of Elliptically Contoured Distributions, 296
18.2.2 Extreme Problems for the Variation Distance, 298
18.2.3 On the Stability of Predictions, 301
18.3 Extreme Problems for Quantiles, 303
18.3.1 Lower Extreme Values for Quantiles of the Greater Level, 304
18.3.2 Lower Extreme Values for Quantiles of the Smallest Level, 305
18.3.3 Upper Extreme Values, 306
References, 307