Activation function
From stats++ wiki
In computational networks, the activation function of a node defines the output of that node given an input or set of inputs.
Types of activation functions
The following table lists activation functions implemented in stats++ that a function of a single fold $x$ from the previous layer(s):
Name | Equation | Derivative | Range |
---|---|---|---|
Logistic | \(f(x)=\frac{1}{1+e^{-x}}\) | \(f'(x)=f(x)(1-f(x))\) | \((0,1)\) |
tanh | \(f(x)=\tanh(x)=\frac{2}{1+e^{-2x}}-1\) | \(f'(x)=1-f(x)^2\) | \((-1,1)\) |
tanh (skewed)[1] | \(f(x)=1.7159\tanh(2x/3)\) | \(f'(x)=(1.7159*2/3)(1 - \tanh(2x/3)\tanh(2x/3))\) | \((-1,1)\) |
Rectified linear unit (ReLU)[2] | \(f(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array} \right.\) | \(f'(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array} \right.\) | \([0,\infty)\) |
SoftPlus[3] | \(f(x)=\ln(1+e^x)\) | \(f'(x)=\frac{1}{1+e^{-x}}\) | \((0,\infty)\) |
The following table lists activation functions implemented in stats++ that are not functions of a single fold $x$ from the previous layer(s):
Name | Equation | Derivatives | Range |
---|---|---|---|
Softmax | \(f(\mathbf{x})_i = \frac{e^{x_i}}{\sum_{k=1}^K e^{x_k}}\) for i = 1, …, K | \(\frac{\partial f(\mathbf{x})_i}{\partial x_j} = f(\mathbf{x})_i(\delta_{ij} - f(\mathbf{x})_j)\) | \((0,1)\) |
Notes and references
- ↑ Y. LeCun, L. Bottou, G. B. Orr, K.-R. M\"{u}ller, "Efficient BackProp," Neural Networks: Tricks of the Trade, in Lecture Notes in Computer Science 1524, 9--50 (2002)
- ↑ V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)
- ↑ X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," International Conference on Artificial Intelligence and Statistics (2011)