# Activation function

In computational networks, the activation function of a node defines the output of that node given an input or set of inputs.

## Types of activation functions

The following table lists activation functions implemented in stats++ that a function of a single fold $x$ from the previous layer(s):

Name Equation Derivative Range
Logistic $$f(x)=\frac{1}{1+e^{-x}}$$ $$f'(x)=f(x)(1-f(x))$$ $$(0,1)$$
tanh $$f(x)=\tanh(x)=\frac{2}{1+e^{-2x}}-1$$ $$f'(x)=1-f(x)^2$$ $$(-1,1)$$
tanh (skewed)[1] $$f(x)=1.7159\tanh(2x/3)$$ $$f'(x)=(1.7159*2/3)(1 - \tanh(2x/3)\tanh(2x/3))$$ $$(-1,1)$$
Rectified linear unit (ReLU)[2] $$f(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ x & \mbox{for} & x \ge 0\end{array} \right.$$ $$f'(x) = \left \{ \begin{array}{rcl} 0 & \mbox{for} & x < 0\\ 1 & \mbox{for} & x \ge 0\end{array} \right.$$ $$[0,\infty)$$
SoftPlus[3] $$f(x)=\ln(1+e^x)$$ $$f'(x)=\frac{1}{1+e^{-x}}$$ $$(0,\infty)$$

The following table lists activation functions implemented in stats++ that are not functions of a single fold $x$ from the previous layer(s):

Name Equation Derivatives Range
Softmax $$f(\mathbf{x})_i = \frac{e^{x_i}}{\sum_{k=1}^K e^{x_k}}$$    for i = 1, …, K $$\frac{\partial f(\mathbf{x})_i}{\partial x_j} = f(\mathbf{x})_i(\delta_{ij} - f(\mathbf{x})_j)$$ $$(0,1)$$

## Notes and references

1. Y. LeCun, L. Bottou, G. B. Orr, K.-R. M\"{u}ller, "Efficient BackProp," Neural Networks: Tricks of the Trade, in Lecture Notes in Computer Science 1524, 9--50 (2002)
2. V. Nair and G. E. Hinton, "Rectified linear units improve restricted boltzmann machines," Proceedings of the 27th International Conference on Machine Learning (ICML-10) (2010)
3. X. Glorot, A. Bordes, and Y. Bengio, "Deep Sparse Rectifier Neural Networks," International Conference on Artificial Intelligence and Statistics (2011)