Principal component analysis (PCA) is a data reduction method. Technically, we take a vector of random variables , and we transform it to another vector , by a linear transformation represented by a square matrix . In more detail we have These equations should not be confused with regression equations. The transformed Zi variables are not observed and used in… Continue reading Principal component analysis
Month: February 2023
Multiple regression models
In regression models there is a clear separation between the regressed variable and the regressors (explanatory variables): This does not necessarily mean that there is a causal relationship, but it is enough to classify regression models as dependence models. Regression models arise naturally for dealing with metric variables, but we may use binary variables to… Continue reading Multiple regression models
AN OVERVIEW OF MULTIVARIATE METHODS
Multivariate methods can be classified along different features: In the following sections we outline some multivariate methods, suggesting a classification along the above dimensions. We do not aim at being comprehensive; the idea is getting to appreciate the richness of this field of statistics, as well as the classification above in concrete terms.
Missing data and outliers
Outliers and wrong data are quite common in data analysis. If data are collected automatically, and they are engineering measurements, this may not be a tough issue; however, when people are involved, either because we are collecting data using questionnaires, or because we are investigating a social system, things may turn out to be a… Continue reading Missing data and outliers
Adapting statistical inference procedures
The core topics in statistical inference are point and interval parameter estimation, hypothesis testing, and analysis of variance. Some of the related procedures are conceptually easy to adapt to a multivariate case. For instance, maximum likelihood estimation is not quite different, even though it is going to prove computationally more challenging, thus requiring numerical optimization… Continue reading Adapting statistical inference procedures
Different types of variables
In standard inferential statistics one typically assumes that data consist of real or integer numbers. However, data may be qualitative as well, and the more dimensions we have, the more likely the joint presence of quantitative and qualitative variables will be. In some cases, dealing with qualitative variables is not that difficult. For instance, if… Continue reading Different types of variables
Complexity and redundancy
Visualization is not the only reason why we need data reduction methods. Quite often, multivariate data stem from the administration of a questionnaire to a sample of respondents; each question corresponds to a single variable, and a set of answers by a single respondent is a multivariate observation. It is customary to ask respondents many… Continue reading Complexity and redundancy
Visualization
The first and most obvious difficulty we face with multivariate data is visualization. If we want to explore the association between variables, one possibility is to draw scatterplots for each pair of them; for instance, if we have 4 variables, we may draw a matrix of scatterplots, like the one illustrated in Fig. 15.1. The matrix… Continue reading Visualization
ISSUES IN MULTIVARIATE ANALYSIS
In the next sections we briefly outline the main complication factors that arise when dealing with multidimensional data. Some of them are to be expected, but some are a bit surprising. Getting aware of these difficulties provides the motivation for studying the wide array of sometimes quite complex methods that have been developed. Fig. 15.1 A matrix… Continue reading ISSUES IN MULTIVARIATE ANALYSIS
Introduction
Multivariate analysis is the more-or-less natural extension of elementary inferential statistics to the case of multidimensional data. The first difficulty we encounter is the representation of data. How can we visualize data in multiple dimensions, on the basis of our limited ability to plot bidimensional and tridimensional diagrams? In Section 15.1 we show that this is just… Continue reading Introduction