http://en.wikipedia.org/wiki/Linear_least_squares

Another drawback of the least squares estimator is the fact that it seeks to minimize the norm of the measurement error, Ax − b. In many cases, one is truly interested in obtaining small error in the parameter x, e.g., a small value of . However, since x is unknown, this quantity cannot be directly minimized. If a prior probability on x is known, then a Bayes estimator can be used to minimize the mean squared error,
. The least squares method is often applied when no prior is known. Surprisingly, however, better estimators can be constructed, an effect known as Stein’s phenomenon. For example, if the measurement error is Gaussian, several estimators are known which dominate, or outperform, the least squares technique; the most common of these is the James-Stein estimator.
Nonlinear Least Squares Fitting
http://mathworld.wolfram.com/NonlinearLeastSquaresFitting.html
http://statpages.org/nonlin.html
http://www.itl.nist.gov/div898/strd/general/bkground.html