Lesson 6.2: ML Workflow – Training, Testing, Evaluation
🔹 Machine Learning Workflow
A typical ML project follows these steps:
-
Data Collection
-
Gather relevant data from databases, APIs, or files.
-
Ensure data is accurate and complete.
-
-
Data Preprocessing
-
Clean data: handle missing values, remove duplicates, handle outliers.
-
Transform data: normalization, encoding categorical variables, feature scaling.
-
-
Train-Test Split
-
Split data into training set (to train the model) and test set (to evaluate performance).
-
Common split: 70–80% training, 20–30% testing.
-
-
Model Selection & Training
-
Choose the appropriate algorithm based on problem type (regression, classification, clustering).
-
Train the model on the training dataset.
-
-
Evaluation
-
Evaluate model on test dataset using metrics:
-
Regression → Mean Squared Error (MSE), R²
-
Classification → Accuracy, Precision, Recall, F1-score
-
-
Adjust model or parameters if performance is unsatisfactory.
-
-
Deployment
-
Once satisfied, deploy the model to make predictions on new, unseen data.
-
✅ Quick Recap:
-
Workflow ensures systematic ML development: Collect → Preprocess → Train → Evaluate → Deploy.
-
Helps build reliable and accurate models.
