Activation function

From stats++ wiki
Jump to: navigation, search

In computational networks, the activation function of a node defines the output of that node given an input or set of inputs.

Types of activation functions

The following table lists activation functions implemented in stats++ that a function of a single fold $x$ from the previous layer(s):

Name Equation Derivative Range
Logistic \(f(x)=\frac{1}{1+e^{-x}}\) \(f'(x)=f(x)(1-f(x))\) \((0,1)\)
tanh \(f(x)=\tanh(x)=\frac{2}{1+e^{-2x}}-1\) \(f'(x)=1-f(x)^2\) \((-1,1)\)
tanh (skewed)[1] \(f(x)=1.7159\tanh(2x/3)\) \(f'(x)=(1.7159*2/3)(1 - \tanh(2x/3)\tanh(2x/3))\) \((-1,1)\)
Rectified linear unit (ReLU)[2] \(f(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array} \right.\) \(f'(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array} \right.\) \([0,\infty)\)
SoftPlus[3] \(f(x)=\ln(1+e^x)\) \(f'(x)=\frac{1}{1+e^{-x}}\) \((0,\infty)\)

The following table lists activation functions implemented in stats++ that are not functions of a single fold $x$ from the previous layer(s):

Name Equation Derivatives Range
Softmax \(f(\mathbf{x})_i = \frac{e^{x_i}}{\sum_{k=1}^K e^{x_k}}\)    for i = 1, …, K \(\frac{\partial f(\mathbf{x})_i}{\partial x_j} = f(\mathbf{x})_i(\delta_{ij} - f(\mathbf{x})_j)\) \((0,1)\)

Notes and references

  1. Y. LeCun, L. Bottou, G. B. Orr, K.-R. M\"{u}ller, "Efficient BackProp," Neural Networks: Tricks of the Trade, in Lecture Notes in Computer Science 1524, 9--50 (2002)
  2. V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)
  3. X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," International Conference on Artificial Intelligence and Statistics (2011)