Restricted Boltzmann machine
- Maybe start with Hopfield
Energy-based model:
\begin{equation} \tag{1} e^{-F(\mathbf{v})} = \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})} \end{equation}
\begin{equation} \tag{2} F(\mathbf{v}) = - \ln \sum_\mathbf{h} e^{-E(\mathbf{v},\mathbf{h})} \end{equation}
To model a distribution on $\{\pm 1\}^n$ we use a machine with $n + m$ units. There are $n$ visible units, each which represent a single bit in a random vector, and $m$ hidden units that create correlations between values of the visible units.
- Maybe put pic of RBM here
The energy of an RBM with parameters ${\boldsymbol \theta} = \{\mathbf{W},\mathbf{a},\mathbf{b}\}$ can be written:
\begin{equation} \tag{3} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} \end{equation}
where the matrix $\mathbf{W}$ determines the symmetric interaction between pairs of hidden and visible units, and $\mathbf{a}$ and $\mathbf{b}$ are bias terms that set the unary energy of the units.
An extension of Eq. (3) to continuous units can be obtained by adding a term(s) to the energy to ensure that the distribution (JMM: maybe discuss distribution above) can be normalized [1]:
\begin{equation} \tag{4} E_c(\mathbf{v},\mathbf{h}) = E(\mathbf{v},\mathbf{h}) + \int_a^b d\mathbf{v} ~ f(\mathbf{v}) +\int_c^d d\mathbf{h} ~ g(\mathbf{h}) \end{equation}
where $f(\mathbf{v})$ and $g(\mathbf{h})$ are the activations of the visible and hidden units, respectively.
Additionally, if we assume that the visible and/or hidden units have independent Gaussian noise, the replacement(s):
\begin{eqnarray} \tag{5} v_i \rightarrow v_i / \sigma_{v,i} \\ h_j \rightarrow h_j / \sigma_{h,j} \end{eqnarray}
are made.
Using these definitions, the energy functions and free energies can be calculated for any type of network.
Examples
Binary--binary
\begin{equation} %\tag{6} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} \end{equation}
Binary--continuous
For continuous hidden units, the activation function $g(\mathbf{v})$ is:
\begin{equation} %\tag{7} g(\mathbf{h}) = \mathbf{h} \end{equation}
with antiderivative:
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
Binary--ReLU
For restricted linear hidden units, the activation function $g(\mathbf{h})$ is:
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} g(\mathbf{h}) = \max(0, \mathbf{h}) \end{equation}
with an antiderivative that can be found by piecewise integration:
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} G(\mathbf{h}) = [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
Continuous--binary
See the antiderivative calculation above.
\begin{equation} %\tag{8} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} \end{equation}
Continuous--continuous
See the antiderivative calculation above.
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
Continuous--ReLU
See the antiderivative calculations above, here and here.
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
ReLU--binary
See the antiderivative calculations above.
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} \end{equation}
ReLU--continuous
See the antiderivative calculations above, here and here.
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
ReLU--ReLU
See the antiderivative calculations above.
\begin{equation} %\tag{eq:RBM_energy_binary_continuous:label exists!} E(\mathbf{v},\mathbf{h}) = -\mathbf{v}^\mathrm{T} \mathbf{W} \mathbf{h} - \mathbf{a}^\mathrm{T} \mathbf{v} - \mathbf{b}^\mathrm{T} \mathbf{h} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} + [ \mathrm{sgn}(\mathbf{h}) + 1 ] \frac{1}{2} \mathbf{h}^\mathrm{T}\mathbf{h} \end{equation}
Examples (free energy)
The following can be obtained using the energy functions found here.
Binary--binary
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}
where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.
Binary--continuous
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
Binary--ReLU
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
Continuous--binary
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}
where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.
Continuous--continuous
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
Continuous--ReLU
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
ReLU--binary
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \sum_j \ln(1 + e^{x_j}) \end{equation}
where $x_j = b_j + (\mathbf{v}^\mathrm{T} \mathbf{W})_j$.
ReLU--continuous
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{2} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( 2 \sqrt{\pi / 2} ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
ReLU--ReLU
\begin{equation} %\tag{eq:RBM_energy_binary_binary:label exists!} F(\mathbf{v}) = -\mathbf{a}^\mathrm{T} \mathbf{v} + [ \mathrm{sgn}(\mathbf{v}) + 1 ] \frac{1}{2} \mathbf{v}^\mathrm{T}\mathbf{v} - \frac{1}{4} \mathbf{x}^\mathrm{T} \mathbf{x} - m \ln( \pi / 2 ) \end{equation}
where $\mathbf{x} = \mathbf{v}^\mathrm{T} \mathbf{W} + \mathbf{b}$.
Notes and references
- ↑ Y. Freund and D. Haussler, "Unsupervised learning of distributions on binary vectors using two layer networks," Technical report (1994).