Estimation methods in the errors-in-variables context

Summary. Constructing a computer model from a large set of data, typically contaminated with noise, is a central problem to fields such as computer vision, pattern recognition, data mining, system identification or time series analysis. In these areas our objective is often to capture the internal laws that govern a system with a succinct parametric representation. Despite the amount and high dimensionality of the data, the equation that relates data points is usually expressed in a compact manner. Unfortunately, nonlinearity in the system under study and the presence of noise means that conventional tools in statistics cannot be directly applied to estimate unknown system parameters.

The dissertation explores estimation methods focused on three related areas of errors-in-variables systems: fitting a nonlinear function to data where the fit is subject to constraints; fitting a union of several elementary nonlinear functions to a data set; and estimating the parameters of discrete-time dynamic systems.

Curve and surface fitting is a well-studied problem but the presence of noise and nonlinearity in the function that relates data points introduces bias and increases estimation variance, which is typically addressed with costly iterative methods. The thesis introduces non-iterative direct methods that fit data subject to constraints, with emphasis on fitting quadratic curves and surfaces, which are nevertheless close to estimates obtained by maximum likelihood methods.

Partitioning a data set into groups whose members are captured by the same relationship in an unsupervised manner is a common task in machine learning, referred to as clustering. While most approaches use a single point as a cluster representative, or cluster data into subspaces, less attention has been paid to nonlinear functions, or manifold clustering. The thesis applies constrained and unconstrained fitting and projection methods in the errors-in-variables context to construct an iterative and a non-iterative algorithm for manifold clustering, which incurs modest computational cost.

Identification of discrete-time dynamic systems is a well-understood problem but its errors-in-variables formulation, when both input and output is polluted by noise, introduces interesting challenges. Several papers discuss system identification of linear errors-in-variables systems but the estimation problem is more difficult in the nonlinear setting. The thesis combines the generalization of the Koopmans–Levin method, an approach to estimate parameters of a linear dynamic system with a scalable balance between accuracy and computational cost, with the nonlinear extension to the original Koopmans method, which gives a non-iterative approach to estimate parameters of a static system described by a polynomial function. The result is an effective system identification method for dynamic errors-in-variables systems with polynomial nonlinearities.