Module 1: Introduction to Data Science

0/5

Lesson 1.1: What is Data Science? – Definition, Applications & Career Scope

Lesson 1.2: Role of Data Scientist – Skills & Responsibilities

Lesson 1.3: Data Science Workflow – Data Collection → Cleaning → Analysis → Modeling → Deployment

Lesson 1.4: Tools and Technologies Used in Data Science (Python, R, Jupyter, SQL, etc.)

Lesson 1.5: Difference between Data Science, AI, ML, and Deep Learning

Module 3: Data Handling & Preprocessing

0/7

Lesson 3.1: Understanding Data – Structured vs Unstructured

Lesson 3.2: Data Collection Methods – APIs, Web Scraping, Databases

Lesson 3.3: Handling Missing Data – Mean/Median, Interpolation, Dropping

Lesson 3.4: Handling Outliers – IQR, Z-Score

Lesson 3.5: Data Encoding – One Hot Encoding, Label Encoding

Lesson 3.6: Feature Scaling – Normalization, Standardization

Lesson 3.7: Splitting Data – Train/Test Split, Cross Validation

Module 4: Exploratory Data Analysis (EDA)

0/6

Lesson 4.1: Introduction to EDA – Why and How

Lesson 4.2: Descriptive Statistics – Mean, Median, Mode, Variance, Std Dev

Lesson 4.3: Data Visualization – Histogram, Scatter Plot, Box Plot, Heatmaps

Lesson 4.4: Correlation Analysis

Lesson 4.5: Identifying Patterns and Trends in Data

Lesson 4.6: Hands-on EDA Project (Titanic Dataset Example)

Module 5: Statistics & Probability for Data Science

0/6

Lesson 5.1: Basics of Statistics – Population vs Sample

Lesson 5.2: Probability & Probability Distributions (Normal, Binomial, Poisson)

Lesson 5.3: Hypothesis Testing – Null & Alternative Hypothesis, p-value

Lesson 5.4: Confidence Intervals

Lesson 5.5: ANOVA & Chi-square Test

Lesson 5.6: Correlation vs Causation

Module 6: Introduction to Machine Learning

0/4

Lesson 6.1: What is Machine Learning? – Definition & Types

Lesson 6.2: ML Workflow – Training, Testing, Evaluation

Lesson 6.3: Bias vs Variance – Underfitting & Overfitting

Lesson 6.4: Train/Test Split & Cross-Validation

Module 8: Unsupervised Learning Algorithms

0/5

Lesson 8.1: K-Means Clustering

Lesson 8.2: Hierarchical Clustering

Lesson 8.3: DBSCAN Clustering

Lesson 8.4: Dimensionality Reduction – PCA

Lesson 8.5: Association Rule Learning – Apriori, Market Basket Analysis

Module 9: Feature Engineering & Model Improvement

0/5

Lesson 9.1: Feature Selection Techniques

Lesson 9.2: Handling Imbalanced Data – SMOTE, Undersampling/Oversampling

Lesson 9.3: Regularization – L1 (Lasso), L2 (Ridge)

Lesson 9.4: Hyperparameter Tuning – Grid Search, Random Search

Lesson 9.5: Ensemble Learning – Bagging, Boosting (AdaBoost, XGBoost, LightGBM)

Module 10: Neural Networks & Deep Learning (Basics)

0/5

Lesson 10.1: What is Neural Network? – Neurons & Perceptron

Lesson 10.2: Activation Functions – Sigmoid, ReLU, Softmax

Lesson 10.3: Forward Propagation & Backpropagation (Conceptual)

Lesson 10.4: Introduction to TensorFlow/Keras

Lesson 10.5: Building a Simple Neural Network

Module 11: Working with Real-World Data

0/4

Lesson 11.1: Introduction to Kaggle & UCI Datasets

Lesson 11.2: Project 1 – Predicting House Prices (Regression)

Lesson 11.3: Project 2 – Titanic Survival Prediction (Classification)

Lesson 11.4: Project 3 – Customer Segmentation (Clustering)

Module 12: Model Deployment (Basics)

0/4

Lesson 12.1: Introduction to Deployment

Lesson 12.2: Saving Models with Pickle/Joblib

Lesson 12.3: Deploying ML Models with Flask / Streamlit

Lesson 12.4: Hosting Models on Cloud (Heroku, AWS – Basic Intro)

Module 13: Ethics & Future of Data Science

0/4

Lesson 13.1: Data Privacy & Security Issues

Lesson 13.2: Bias and Fairness in Machine Learning

Lesson 13.3: Explainable AI (XAI) – Why it Matters

Lesson 13.4: Career Paths in Data Science & ML

Data Science & Machine Learning – Final Assessment

0/1

Final Multiple Choice Questions (MCQ)

Data Science and Machine Learning Basics

Lesson 3.5: Data Encoding – One Hot Encoding, Label Encoding

Introduction

Most machine learning models work with numerical data. However, in real-world datasets, we often encounter categorical data (e.g., gender = Male/Female, city = Delhi/Mumbai/Kolkata). To use such data in models, we need to convert categorical values into numerical form without losing information. This process is called Data Encoding.

Two commonly used methods are:

Label Encoding
One Hot Encoding

1. Label Encoding

Assigns a unique integer value to each category.
Example:

City Encoded

Delhi 0

Mumbai 1

Kolkata 2

🔹 Pros: Simple, memory efficient.
🔹 Cons: May create false ordinal relationships (e.g., model may think Mumbai > Delhi).

Example in Python:

Output:

2. One Hot Encoding

Creates a binary column (0/1) for each category.
Example:

City Delhi Mumbai Kolkata

Delhi 1 0 0

Mumbai 0 1 0

Kolkata 0 0 1

🔹 Pros: No false order, better for categorical variables.
🔹 Cons: Increases dataset size (especially for many categories).

Example in Python:

Output:

When to Use?

Label Encoding → Good for ordinal data (e.g., education level: High School < Graduate < Postgraduate).
One Hot Encoding → Better for nominal data (no order, e.g., city names, colors).

✅ Summary:

Data encoding converts categorical values into numerical format.
Label Encoding replaces categories with numbers.
One Hot Encoding creates separate binary columns for each category.
Choice depends on whether the data is ordinal or nominal.