Lesson 10.2: Activation Functions – Sigmoid, ReLU, Softmax
🔹 What is an Activation Function?
Activation functions introduce non-linearity into neural networks.
-
Without them, the network would behave like a linear model, unable to learn complex patterns.
-
Determines whether a neuron should be activated or not.
🔹 Common Activation Functions
-
Sigmoid
-
Formula: σ(x)=11+e−x\sigma(x) = \frac{1}{1 + e^{-x}}
-
Output: 0 to 1
-
Used in binary classification.
-
Limitation: Can cause vanishing gradient in deep networks.
-
ReLU (Rectified Linear Unit)
-
Formula: f(x)=max(0,x)f(x) = max(0, x)
-
Output: 0 or positive value
-
Most popular for hidden layers.
-
Advantage: Prevents vanishing gradient, faster training.
-
Softmax
-
Formula: softmax(xi)=exi∑jexj\text{softmax}(x_i) = \frac{e^{x_i}}{\sum_j e^{x_j}}
-
Output: Probability distribution over multiple classes (sum = 1)
-
Used in multi-class classification.
🔹 Example (Using Keras)
-
ReLU → Hidden layer
-
Softmax → Output layer for multi-class classification
🔹 Advantages
-
Enable neural networks to learn complex, non-linear relationships.
-
ReLU → Efficient and reduces training time.
-
Softmax → Produces probabilities for multi-class outputs.
🔹 Limitations
-
Sigmoid → Can saturate gradients.
-
ReLU → Can cause dying neurons (neurons stuck at 0).
✅ Quick Recap:
-
Activation functions → Add non-linearity.
-
Sigmoid → Binary output, 0–1.
-
ReLU → Hidden layers, fast training.
-
Softmax → Multi-class probability output.
