Lesson 6.2: ML Workflow – Training, Testing, Evaluation

Data Science and Machine Learning Basics

A typical ML project follows these steps:

Data Collection
- Gather relevant data from databases, APIs, or files.
- Ensure data is accurate and complete.
Data Preprocessing
- Clean data: handle missing values, remove duplicates, handle outliers.
- Transform data: normalization, encoding categorical variables, feature scaling.
Train-Test Split
- Split data into training set (to train the model) and test set (to evaluate performance).
- Common split: 70–80% training, 20–30% testing.
Model Selection & Training
- Choose the appropriate algorithm based on problem type (regression, classification, clustering).
- Train the model on the training dataset.
Evaluation
- Evaluate model on test dataset using metrics:
  - Regression → Mean Squared Error (MSE), R²
  - Classification → Accuracy, Precision, Recall, F1-score
- Adjust model or parameters if performance is unsatisfactory.
Deployment
- Once satisfied, deploy the model to make predictions on new, unseen data.

✅ Quick Recap:

Workflow ensures systematic ML development: Collect → Preprocess → Train → Evaluate → Deploy.
Helps build reliable and accurate models.