Lesson 6.4: Train/Test Split & Cross-Validation
🔹 Train/Test Split
-
Purpose: Evaluate how well a machine learning model performs on unseen data.
-
Method:
-
Split dataset into training set (used to train the model) and test set (used to evaluate performance).
-
Common splits: 70–30, 80–20.
-
Example:
-
Training set → Model learns patterns.
-
Test set → Measures real-world performance.
🔹 Cross-Validation (CV)
-
Purpose: Ensure model is robust and not dependent on a single train-test split.
-
Method:
-
Data is divided into k subsets (folds).
-
Model is trained on k-1 folds and tested on the remaining fold.
-
Repeat for all folds → Average results for performance metrics.
-
Example (k=5 fold CV):
-
Reduces overfitting and gives more reliable evaluation.
✅ Quick Recap:
-
Train/Test Split: Quick evaluation on unseen data.
-
Cross-Validation: More reliable performance estimate, reduces overfitting risk.
