Meaning and Effect
- In statistics, a collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. That is, if the variance of dependent values in a dataset is (significantly) different depending on the related independent values, we can say that the dataset is heteroscedastic.
- Example 1: A classic example of heteroscedasticity is that of income versus expenditure on meals. As one’s income increases, the variability of food consumption will increase. A poorer person will spend a rather constant amount by always eating inexpensive food; a wealthier person may occasionally buy inexpensive food and at other times eat expensive meals. Those with higher incomes display a greater variability of food consumption. (higher income – independent variable – cause higher variance in expenditure on meals – dependent variable.)
- Wikipedia link
Heteroscedasticity does not cause ordinary least squares coefficient estimates to be biased, although it can cause ordinary least squares estimates of the variance (and, thus, standard errors) of the coefficients to be biased, possibly above or below the true or population variance. Thus, regression analysis using heteroscedastic data will still provide an unbiased estimate for the relationship between the predictor variable and the outcome, but standard errors and therefore inferences obtained from data analysis are suspect. Biased standard errors lead to biased inference, so results of hypothesis tests are possibly wrong. For example, if OLS is performed on a heteroscedastic data set, yielding biased standard error estimation, a researcher might fail to reject a null hypothesis at a given significance level, when that null hypothesis was actually uncharacteristic of the actual population (making a type II error).