The Concept of Heteroscedasticity
In econometrics, it is said that a linear regression model presents heteroscedasticity when the variance of the perturbations is not constant throughout the observations. This implies the breach of one of the basic hypotheses on which the linear regression model is based.
Recall that one of the basic assumptions of linear regression is “Errors have constant variance.” From it is derived that the data with which one works are heterogeneous since they come from probability distributions with different variance.
This sequence relates to the regression model assumptions and introduces the topic of heteroscedasticity. This relates to the distribution of the disturbance term in a regression model.
We will discuss it in the context of the regression model Y = b1 + b2X + u. To keep the diagram uncluttered, we will suppose that we have a sample of only five observations, the X values of which are as shown.
If there were no disturbance terms in the model, the observations would lie on the line as shown.
Now we take account of the effect of the disturbance term. It will displace each observation in the vertical dimension since it modifies the value of Y without affecting X.
The disturbance term in each observation is hypothesized to be drawn randomly from a given distribution. In the diagram, three assumptions are being made.
One is that the expected value of u in each observation is 0. The second is that the distribution in each observation is normal. We are not concerned with either of these and we will assume them to be true.
The third is that the variance of the distribution of the disturbance term is the same for each observation. In the present case, that means that the normal distributions are shown all have the same variance.
It is satisfied, the disturbance term is said to be homoscedastic (Greek for the same scattering).
Each observation is then potentially (before the sample is drawn) an equally reliable guide to the location of the line Y = b1 + b2X.
Once the sample has been drawn, some observations will lie closer to the line than others, but we have no way of anticipating in advance which ones these will be.
Now consider the situation illustrated by the diagram above. The distribution of u associated with each observation still has an expected value of 0 and is normal. However, the third assumption is violated and the variance is no longer constant.
Obviously, observations where u has low variance, like that for X1, will tend to be better guided to the underlying relationship than those like that for X5, where it has a relatively high variance.
When the distribution is not the same for each observation, the disturbance term is said to be subject to heteroscedasticity.
There are two major consequences of heteroscedasticity. One is that the standard errors of the regression coefficients are estimated wrongly and the t-tests (and F test) are invalid.
The other is that OLS is an inefficient estimation technique. An alternative technique that gives relatively high weight to the relatively low-variance observations should tend to yield more accurate estimates.