Accession Number : ADA148833


Title :   Principal Curves and Surfaces


Descriptive Note : Technical rept.


Corporate Author : STANFORD UNIV CA LAB FOR COMPUTATIONALSTATISTICS


Personal Author(s) : Hastie, Trevor


Full Text : http://www.dtic.mil/dtic/tr/fulltext/u2/a148833.pdf


Report Date : Nov 1984


Pagination or Media Count : 107


Abstract : Principal curves are smooth one dimensional curves that pass through the middle of a p dimensional data set. They minimize the distance from the points, and provide a non-linear summary of the data. The curves are non- parametric and their shape is suggested by the data. Similarly, principal surfaces are two dimensional surfaces that pass through the middle of the data. The curves and surfaces are found using an iterative procedure which starts with a liner summary such as the usual principal component line or plate. Each successive iteration is a smooth or local average of the p dimensional points, where local is based on the projections of the points onto the curve or surface of the previous iteration. A number of linear techniques, such as factor analysis and errors in variables regression, end up using the principal components as their estimates (after a suitable scaling of the co-ordinates). Principal curves and surfaces can be viewed as the estimates of non-linear generalizations of these procedures. Principal Curves (or surfaces) have a theortical definition for distributions: they are the Self Consistent curves. A curve is self consistent if each point on the curve is the conditional mean of the points that project there. The main theorem proves that principal curves are critical values of the expected squared distance between the points and the curve. Linear principal components have this property as well; in fact, we prove that if a principal curve is straight, then it is a principal component. These results generalize the usual duality between conditional expectation and distance minimization. We also examine two sources of bias in the procedures, which have the satisfactory property of partially cancelling each other.


Descriptors :   *NONPARAMETRIC STATISTICS , *INFORMATION THEORY , *DISTRIBUTION CURVES , LINEAR SYSTEMS , METHODOLOGY , GRAPHS , ESTIMATES , SURFACES , CURVATURE , NONLINEAR SYSTEMS , VALUE , BIAS , ITERATIONS , FACTOR ANALYSIS


Subject Categories : Statistics and Probability
      Cybernetics


Distribution Statement : APPROVED FOR PUBLIC RELEASE