From stats++ wiki
Jump to: navigation, search

In probability theory and statistics, skewness is a measure of the asymmetry of the probability distribution of a real-valued random variable about its mean. The value of skewness can be positive or negative, or even undefined.


The skewness of a random variable $X$ is the third standardized moment $\gamma_1$, defined as: \begin{equation} \tag{1} \gamma_1 = \operatorname{E}\left[\left(\frac{X-\mu}{\sigma}\right)^3 \right] = \frac{\mu_3}{\sigma^3} = \frac{\operatorname{E}\left[(X-\mu)^3\right]}{\ \ \ ( \operatorname{E}\left[ (X-\mu)^2 \right] )^{3/2}} = \frac{\kappa_3}{\kappa_2^{3/2}} \end{equation} where $\mu$ is the mean, $\sigma$ is the standard deviation, $\operatorname{E}$ is the expectation operator, $\mu_3$ is the third central moment, and $\kappa_t$ are the $t$th cumulants.

Sample Skewness

There are several ways to define sample skewness.

In many older texts, (sample) skewness is given by: \begin{equation} \tag{2} g_1 = \frac{m_3}{m_2^{3/2}} \end{equation} where $m_t$ is the $t$th (sample) moment: \begin{equation} \tag{3} m_t = \frac{1}{n} \sum \left( x_i - \mu \right)^t \end{equation} Note that $m_t$ is a biased estimate of the population moment $\mu_t$.

One way to remove the bias in $g_1$ is: \begin{equation} \tag{4} G_1 = \frac{\sqrt{n(n-1)}}{n-2} g_1 \end{equation}

Note that this is not the only possibility. Another is given by: \begin{equation} \tag{5} b_1 = \left( \frac{n-1}{n} \right)^{3/2} \frac{m_3}{m_2^{3/2}} \end{equation}

These different measures have been studied in [1]. Both of these measures are unbiased. For large sample sizes, there is little difference between the measures. Differences exist for small sample sizes, however. For samples from a normal distribution, $b_1$ has the lowest MSE; for samples from an asymmetric distribution, $G_1$ has the lowest MSE. Whatever distribution is being sampled, $G_1$ has the greatest variance.


In stats++, $G_1$ is implemented.

Notes and references

  1. D. N. Joanes and C. A. Gill, "Comparing measures of sample skewness and kurtosis," The Statistician 47, 183--189 (1998).