Главная | Публикации

Computational Learning and Probabilistic Reasoning: Contents

CONTENTS

Preface, xiii

List of Figures, xvii

List of Tables, xxi

List of Contributors, xxiii

I Generalisation Principles and Learning 1

1 Structure of Statistical Learning Theory (V. Vapnik), 3

1.1 Function Estimation Model, 3
1.2 Problem of Risk Minimization, 4
1.3 Three Main Learning Problems, 4
   1.3.1 The Problem of Pattern Recognition, 4
   1.3.2 The Problem of Regression Estimation, 5
   1.3.3 The Problem of Density Estimation, 5
   1.3.4 The General Setting of the Learning Problem, 5
   1.4 Empirical Risk Minimization Induction Principle, 6
   1.5 Four Parts of Learning Theory, 6
1.6 Theory of Consistency of the Learning Processes, 7
   1.6.1 The Key Assertion of the Learning Theory, 7
   1.6.2 The Necessary and Sufficient Conditions for Uniform Convergence, 8
   1.6.3 Three Milestones of Learning Theory, 9
1.7 Bounds for the Rate or Convergence of the Learning Processes, 11
   1.7.1 The Structure of the Growth-function, 12
   1.7.2 VC-dimension of the Set of Functions, 12
   1.7.3 Distribution Independent Bounds for the Rate of Convergence of the Learning Processes, 13
   1.7.4 Problem of Constructing Rigorous (distribution dependent) Bounds, 15
1.8 Theory for Controlling Generalization Ability of Learning Processes, 16
   1.8.1 Structural Risk Minimization Induction Principle, 16
   1.8.2 Examples of the Structure for Neural Nets, 18
   1.8.3 The SRM Principle in the Problem of Polynomial Regression, 19
   1.8.4 Problem of Local Function Estimation, 19
1.9 Theory of Constructing Learning Algorithms for the Pattern Recognition Problem, 20
   1.9.1 Sigmoid Approximation of Indicator Functions and Neural Nets, 21
   1.9.2 Method of Optimal Separating Hyperplane and Support Vectors Networks, 22
   1.9.3 Why Can Neural Networks and Support Vectors Networks generalize? 26
References, 30

2 Stochastic Complexity — an Introduction (J. Rissanen), 33

2.1 General, 33
2.2 Stochastic Complexity, 36
2.3 Predictive Coding, 39
References, 41

3 MML Inference of Predictive Trees, Graphs and Nets (C.S. Wallace), 43

3.1 A Critique of Computational Learning Theory, 43
   3.1.1 Computational Learning Theory, 44
   3.1.2 Gold's Model, 44
   3.1.3 Gold's Theory, 44
   3.1.4 Valiant's Model, 45
   3.1.5 The Deductive Fallacy, 46
3.2 Learning from Data, 47
   3.2.1 Learning is NOT Deduction, 47
   3.2.2 Learning as a Competition, 47
   3.2.3 Explanations, 48
3.3 The Message Length Criterion, 49
   3.3.1 Parameter Coding, 50
   3.3.2 Bayesian Parallel, 50
3.4 Theoretical Results, 51
   3.4.1 Sufficiency, 51
   3.4.2 Avoidance of Overfitting, 51
   3.4.3 Consistency, 52
   3.4.4 Oracular Imitation, 52
   3.4.5 Invariance, 53
   3.4.6 Kullback-Liebler Distance, 53
3.5 The Kolmogarov-Chaitin Formulation, 54
   3.5.1 Theoretical Results from Kolmogorov-Chaitin Form, 54
   3.6 Prediction vs. Induction, 55
   3.6.1 Prediction in the Kolmogarov-Chaitin Formulation, 55
3.7 Applications, 56
   3.7.1 Learning the Class Structure of Populations ("unsupervised learning"), 56
   3.7.2 Learning the Shape Patterns of Megalithic Stone Circles, 56
   3.7.3 Learning Simple Grammars, 57
   3.7.4 Learning Rules in the Form of Decision Trees, 57
   3.7.5 Learning Decision Graphs, 57
   3.7.6 Inferring Relationships among DNA Strings, 58
   3.7.7 Learning Existence and Effects of Hidden Variables in Linear Models, 58
   3.7.8 Inference of Prolog Programs, 58
3.8 Coding Techniques, 58
   3.8.1 Encoding Real-Valued Data, 59
   3.8.2 Coding Real Parameters, 61
   3.8.3 Coding Discrete Parameters, 62
   3.8.4 Prior and Progressive Codes, 63
References, 64

4 Learning and Reasoning as Information Compression by Multiple Alignment, Unification and Search (J.G. Wolff), 67

4.1 Introduction, 67
4.2 Multiple Alignment Problems, 68
   4.2.1 What is an "Optimal" Alignment? 68
   4.2.2 Searching for Optimal Alignments, 69
4.3 Unifications and Tags, 69
4.4 Learning as Multiple Alignment with Unification and Search, 70
   4.4.1 Learning Segmental Structure, 71
   4.4.2 Learning Disjunctive Classes, 72
4.5 Reasoning as Multiple Alignment with Unification and Search, 74
   4.5.1 Recognition and Reasoning, 74
   4.5.2 Modus Ponens, 74
   4.5.3 Modus Tollens, 76
   4.5.4 Abduction, 77
   4.5.5 Chains of Reasoning, 78
   4.5.6 Inductive Reasoning, 78
   4.5.7 Analogical Reasoning, 81
4.6 Conclusion, 81
References, 83

5 Probabilistic Association and Denotation in Machine Learning of Natural Language (P. Suppes and L. Liang), 87

5.1 Our Approach to Machine Learning, 88
5.2 Problem of Denotation, 89
5.3 Background Cognitive and Perceptual Assumptions, 90
5.4 The General Axioms of Association and Denotation, 93
5.5 Denotational Learning Models, 94
5.6 Some Empirical Results, 99
5.7 Conclusion, 100
References, 100

II Causation and Model Selection, 101

6 Causation, Action, and Counterfactuals (J. Pearl), 103

6.1 Introduction, 103
6.2 Action as a Local Surgery, 106
   6.2.1 Mechanisms and Surgeries, 106
   6.2.2 Laws vs. Facts, 107
   6.2.3 Mechanisms and Causal Relationships, 108
   6.2.4 Causal Ordering, 108
   6.2.5 Imaging vs. Conditioning, 109
6.3 Formal Underpinning, 111
   6.3.1 Causal Theories, Actions, Causal Effect, and Identifiability, 111
   6.3.2 Action Calculus, 115
   6.3.3 Historical Background, 118
6.4 Counterfactuals, 118
References, 120

7 Another Semantics for Pearl's Action Calculus (V.G. Vovk), 125

7.1 Introduction, 125
7.2 Main Result, 128
7.3 Formal Action Calculus, 132
7.4 Proof of Theorem 7.1, 137
7.5 Discussion, 141
References, 143

8 Efficient Estimation and Model Selection in Large Graphical Models (D. Wedelin), 145

8.1 Introduction, 145
8.2 An Orthogonal Interaction ModeL, 146
   8.2.1 Independence Properties in the Orthogonal Model, 148
8.3 Efficient Parameter Estimation in a Given Undirected Model, 149
   8.3.1 The Reverse Algorithm, 150
   8.3.2 An Improved Reverse Algorithm, 151
8.4 Efficient Model Selection, 152
8.5 Directed Models, 154
8.6 Computational Results, 155
8.7 Conclusion, 158
References, 158

9 T-Normal Distribution on the Bayesian Belief Networks (Yu.N. Blagoveschensky), 161

9.1 Introduction, 161
9.2 Necessary Definitions and Denotations, 162
9.3 Results and some New Problems, 163
9.4 Conclusions, 166
References, 166

III Bayesian Belief Networks and Hybrid Systems, 167

10 Bayesian Belief Networks with an Application in Specific Case Analysis (C.G.G. Aitken et al.), 169

10.1 Introduction, 169
10.2 The CATCHEM Project, 170
   10.2.1 Offender Variables 170
   10.2.2 Victim Variables, 171
   10.2.3 Combination of Offender and Victim Variables, 171
10.3 PRESS — a Bayesian Belief Network System, 171
   10.3.1 Network Construction, 173
   10.3.2 System Testing and Performance, 173
10.4 Examples of Networks, 174
   10.4.1 Seven Node Network, 174
   10.4.2 Ten Node Network, 178
   10.4.3 Networks with a Large Number of Nodes, 180
10.5 Comments, 180
10.6 Discussion, 182
   10.6.1 Measures of Performance, 182
   10.6.2 Models in Real Time, 182
   10.6.3 Interpretation of Probabilities, 182
   10.6.4 Incorporation of Detective Expertise, 182
10.7 Conclusion, 183
References, 184

11 Baysian Belief Networks and Patient Treatment (L.D. Meshalkin and E.K. Tsybulkin), 185

11.1 Introduction, 185
11.2 Scale of Severity of a Patient's State and Mathematical Tools, 186
11.3 Mathematical Tools Adjustment, 188
11.4 Notation, Time Structure and Leading Graph, 189
11.5 The Choice Between Different Trajectories of Disease Development as a New Problem, 191
11.6 Example, 191
11.7 Conclusion, 195
References, 196

12 A Higher Order Bayesian Neural Network for Classification and Diagnosis (A. Hoist and A. Lansner), 199

12.1 The Classification Task, 199
12.2 The Bayesian Confidence Propagation Neural Network, 202
12.3 The Diagnosis Application, 205
12.4 Discussion and Conclusions, 207
References, 208

13 Genetic Algorithms Applied to Bayesian Networks (P. Larranaga et al.), 211

13.1 Introduction, 211
13.2 Genetic Algorithms, 212
13.3 Genetic Operators in Relation to the TSP, 213
   13.3.1 Crossover Operators, 213
   13.3.2 Mutation Operators, 216
13.4 Decomposition of Bayesian Networks, 216
   13.4.1 Introduction, 216
   13.4.2 Description of the Experiments, 217
   13.4.3 Results and Conclusions, 217
13.5 Structure Learning, 219
   13.5.1 Introduction and Related Work, 219
   13.5.2 Searching in the Space of Network Structures, 219
   13.5.3 Searching for the Best Ordering, 223
   13.5.4 Remarks and Conclusions, 224
13.6 Fusion, 225
   13.6.1 Introduction, 225
   13.6.2 Proposed Approach, 225
   13.6.3 An Example, 225
   13.6.4 Remarks and Conclusions, 227
13.7 Conclusions, 227
References, 227

IV Decision-Making, Optimization and Classification, 235

14 Rationality, Conditional Independence and Statistical Models of Competition (J.Q. Smith and C.T.J. Allard), 237

14.1 Introduction, 237
14.2 Making Deductions in Statistical and Decision Problems, 240
14.3 Common-Knowledge and Rationality, 244
14.4 Equilibral Rationality, 248
14.5 Statistical Models Structured from Evoking Rationality, 251
14.6 Conclusions, 256
References, 256

15 Axioms for Dynamic Programming (P.P. Shenoy), 259

15.1 Introduction, 259
15.2 Valuation Networks and Optimization, 261
15.3 The Axioms, 264
15.4 A Fusion Algorithm, 265
15.5 Mitten's Axioms for Dynamic Programming, 270
15.6 Conclusion, 271
15.7 Proofs, 271
References, 273

16 Mixture-Model Cluster Analysis Using the Projection Pursuit Method (S. Aivazian), 277

16.1 Mixture-Model and Cluster Analysis Problem, 277
16.2 Estimating the Unknown Parameters of the Mixture-Model, 278
16.3 Problem of Estimating the Unknown Number of Classes, 279
16.4 Detecting the Number of Clusters by Means of Projection Pursuit, 281
   16.4.1 Projection Pursuit Technique, 281
   16.4.2 Discriminant Subspace, 282
   16.4.3 Projection Index Suitable for Detecting the Number of Clusters, 283
16.5 Mixture-Model Cluster Analysis and Bayesian Belief Networks, 284
References, 285

17 A Parallel k_n-Nearest Neighbour Classifier for Estimation of Non-linear Decision Regions (A. Kovalenko), 287

17.1 Introduction, 287
17.2 Parallel Classifier, 288
17.3 Signal Recognition Against a Noise Background: A Numerical Example, 291
References, 293

18 Extreme Values of Punctionals Characterizing Stability of Statistical Decisions (A.V. Nagaev), 295

18.1 Introduction, 295
18.2 Stability within a Belief Network, 296
   18.2.1 Properties of Elliptically Contoured Distributions, 296
   18.2.2 Extreme Problems for the Variation Distance, 298
   18.2.3 On the Stability of Predictions, 301
18.3 Extreme Problems for Quantiles, 303
18.3.1 Lower Extreme Values for Quantiles of the Greater Level, 304
   18.3.2 Lower Extreme Values for Quantiles of the Smallest Level, 305
   18.3.3 Upper Extreme Values, 306
References, 307

Index, 309