# Restricted Boltzmann machine

Energy-based model:

$$\tag{1} e^{-F(\mathbf{v})} = \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})}$$

$$\tag{2} F(\mathbf{v}) = - \ln \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})}$$

To model a distribution on $\{\pm 1\}^n$ we use a machine with $n + m$ units. There are $n$ visible units, each which represent a single bit in a random vector, and $m$ hidden units that create correlations between values of the visible units.

• Maybe put pic of RBM here

The energy of an RBM with parameters ${\boldsymbol \theta} = \{\mathbf{W},\mathbf{a},\mathbf{b}\}$ can be written:

$$\tag{3} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h}$$

where the matrix $\mathbf{W}$ determines the symmetric interaction between pairs of hidden and visible units, and $\mathbf{a}$ and $\mathbf{b}$ are bias terms that set the unary energy of the units.

An extension of Eq. (3) to continuous units can be obtained by adding a term(s) to the energy to ensure that the distribution (JMM: maybe discuss distribution above) can be normalized [1]:

$$\tag{4} E_c(\mathbf{v},\mathbf{h}) = E(\mathbf{v},\mathbf{h}) + \int_a^b d\mathbf{v} ~ f(\mathbf{v}) +\int_c^d d\mathbf{h} ~ g(\mathbf{h})$$

where $f(\mathbf{v})$ and $g(\mathbf{h})$ are the activations of the visible and hidden units, respectively.

Additionally, if we assume that the visible and/or hidden units have independent Gaussian noise, the replacement(s):

\begin{eqnarray} \tag{5} v_i \rightarrow v_i / \sigma_{v,i} \\ h_j \rightarrow h_j / \sigma_{h,j} \end{eqnarray}

Using these definitions, the energy functions and free energies can be calculated for any type of network.

## Examples

#### Binary--binary

$$%\tag{6} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h}$$

#### Binary--continuous

For continuous hidden units, the activation function $g(\mathbf{v})$ is:

$$%\tag{7} g(\mathbf{h}) = \mathbf{h}$$

with antiderivative:

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

#### Binary--ReLU

For restricted linear hidden units, the activation function $g(\mathbf{h})$ is:

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} g(\mathbf{h}) = \max(0, \mathbf{h})$$

with an antiderivative that can be found by piecewise integration:

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

$$%\tag{eq:RBM_energy_binary_binary:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

#### Continuous--binary

See the antiderivative calculation above.

$$%\tag{8} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v}$$

#### Continuous--continuous

See the antiderivative calculation above.

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

#### Continuous--ReLU

See the antiderivative calculations above, here and here.

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

#### ReLU--binary

See the antiderivative calculations above.

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v}$$

#### ReLU--continuous

See the antiderivative calculations above, here and here.

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

#### ReLU--ReLU

See the antiderivative calculations above.

$$%\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h}$$

## Examples (free energy)

The following can be obtained using the energy functions found here.

#### Binary--binary

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \sum_j \ln(1 + e^{x_j})$$

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

#### Binary--continuous

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

#### Binary--ReLU

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

#### Continuous--binary

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j})$$

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

#### Continuous--continuous

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

#### Continuous--ReLU

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

#### ReLU--binary

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j})$$

where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.

#### ReLU--continuous

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

#### ReLU--ReLU

$$%\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 )$$

where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.

## Notes and references

1. Y. Freund and D. Haussler, "Unsupervised learning of distributions on binary vectors using two layer networks," Technical report (1994).