Restricted Boltzmann machine

From stats++ wiki
Jump to: navigation, search
  • Maybe start with Hopfield

Energy-based model:

\begin{equation} \tag{1} e^{-F(\mathbf{v})} = \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})} \end{equation}

\begin{equation} \tag{2} F(\mathbf{v}) = - \ln \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})} \end{equation}

To model a distribution on $\{\pm 1\}^n$ we use a machine with $n + m$ units. There are $n$ visible units, each which represent a single bit in a random vector, and $m$ hidden units that create correlations between values of the visible units.

  • Maybe put pic of RBM here

The energy of an RBM with parameters ${\boldsymbol \theta} = \{\mathbf{W},\mathbf{a},\mathbf{b}\}$ can be written:

\begin{equation} \tag{3} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} \end{equation}

where the matrix $\mathbf{W}$ determines the symmetric interaction between pairs of hidden and visible units, and $\mathbf{a}$ and $\mathbf{b}$ are bias terms that set the unary energy of the units.

An extension of Eq. (3) to continuous units can be obtained by adding a term(s) to the energy to ensure that the distribution (JMM: maybe discuss distribution above) can be normalized [1]:

\begin{equation} \tag{4} E_c(\mathbf{v},\mathbf{h}) = E(\mathbf{v},\mathbf{h}) + \int_a^b d\mathbf{v} ~ f(\mathbf{v}) +\int_c^d d\mathbf{h} ~ g(\mathbf{h}) \end{equation}

where $f(\mathbf{v})$ and $g(\mathbf{h})$ are the activations of the visible and hidden units, respectively.

Additionally, if we assume that the visible and/or hidden units have independent Gaussian noise, the replacement(s):

\begin{eqnarray} \tag{5} v_i \rightarrow v_i / \sigma_{v,i} \\ h_j \rightarrow h_j / \sigma_{h,j} \end{eqnarray}

are made.

Using these definitions, the energy functions and free energies can be calculated for any type of network.

Examples

Binary--binary

\begin{equation} %\tag{6} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} \end{equation}

Binary--continuous

For continuous hidden units, the activation function $g(\mathbf{v})$ is:

\begin{equation} %\tag{7} g(\mathbf{h}) = \mathbf{h} \end{equation}

with antiderivative:

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

Binary--ReLU

For restricted linear hidden units, the activation function $g(\mathbf{h})$ is:

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} g(\mathbf{h}) = \max(0, \mathbf{h}) \end{equation}

with an antiderivative that can be found by piecewise integration:

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

Continuous--binary

See the antiderivative calculation above.

\begin{equation} %\tag{8} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} \end{equation}

Continuous--continuous

See the antiderivative calculation above.

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

Continuous--ReLU

See the antiderivative calculations above, here and here.

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

ReLU--binary

See the antiderivative calculations above.

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} \end{equation}

ReLU--continuous

See the antiderivative calculations above, here and here.

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

ReLU--ReLU

See the antiderivative calculations above.

\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}

Examples (free energy)

The following can be obtained using the energy functions found here.

Binary--binary

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

Binary--continuous

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

Binary--ReLU

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

Continuous--binary

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

Continuous--continuous

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

Continuous--ReLU

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

ReLU--binary

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

ReLU--continuous

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

ReLU--ReLU

\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

Notes and references

  1. Y. Freund and D. Haussler, "Unsupervised learning of distributions on binary vectors using two layer networks," Technical report (1994).