Course Content
Module 1: Introduction to Data Science
This module introduced Data Science basics, its applications and career scope. We learned the role of a Data Scientist, their skills & responsibilities. The workflow (collection → cleaning → analysis → modeling → deployment) was explained. We also saw common tools (Python, R, SQL, Jupyter) and the difference between Data Science, AI, ML & Deep Learning.
0/5
Module 2: Python for Data Science
In this module, you learned the fundamentals of Python programming tailored for Data Science. You explored Python basics, control structures, functions, and built-in data structures. You also mastered file handling, exception handling, and essential data science libraries such as NumPy (arrays & computations), Pandas (data manipulation & cleaning), and Matplotlib/Seaborn (data visualization). 👉 After completing this module, you are now ready to analyze, clean, and visualize real-world datasets using Python.
0/10
Module 3: Data Handling & Preprocessing
In this module, you learned how to prepare raw data for Machine Learning models: Introduction to NumPy & Pandas → Efficient libraries for data manipulation. Importing & Exploring Data → Loading datasets, checking structure, missing values. Data Cleaning → Handling missing values, duplicates, and inconsistencies. Feature Engineering → Creating new features, scaling & normalization. Encoding Categorical Data → One-hot encoding, label encoding. Handling Outliers → Detecting and treating unusual data points. Splitting Data → Train/Test Split & Cross Validation for model evaluation. ✅ By the end of this module, you now understand how to clean, transform, and prepare datasets so that ML models can learn effectively.
0/7
Module 5: Statistics & Probability for Data Science
In this module, you will learn the fundamentals of statistics and probability that form the backbone of data science. You’ll explore how to work with population and samples, understand probability distributions like Normal, Binomial, and Poisson, and perform hypothesis testing with p-values. You will also study confidence intervals, advanced tests like ANOVA and Chi-square, and finally learn to distinguish between correlation and causation. By the end of this module, you’ll have the statistical knowledge required to analyze data rigorously and make reliable, data-driven decisions.
0/6
Module 6: Introduction to Machine Learning
This module introduces the fundamentals of Machine Learning (ML) – the science of building algorithms that learn from data. You will learn what ML is, its main types, the typical workflow of ML projects, and important concepts like bias, variance, underfitting, overfitting, and validation techniques. By the end, you’ll have a clear foundation for understanding and applying ML models.
0/4
Module 7: Supervised Learning Algorithms
This module covers Supervised Learning, where models learn from labeled data to make predictions. You will learn popular regression and classification algorithms, including Linear Regression, Logistic Regression, KNN, Decision Trees, Random Forest, SVM, and Naive Bayes. You’ll also study evaluation metrics for both regression and classification problems to measure model performance accurately. By the end of this module, you’ll be able to apply supervised learning algorithms to real-world datasets and evaluate their performance.
0/11
Module 8: Unsupervised Learning Algorithms
This module introduces Unsupervised Learning, where models learn from unlabeled data to find hidden patterns, clusters, or associations. You will explore popular clustering algorithms like K-Means, Hierarchical, and DBSCAN, understand dimensionality reduction using PCA, and learn association rule mining techniques such as Apriori for market basket analysis. By the end of this module, you’ll be able to group similar data, reduce complexity, and discover meaningful relationships in datasets.
0/5
Module 9: Feature Engineering & Model Improvement
This module focuses on enhancing model performance through feature engineering and optimization techniques. You will learn how to select important features, handle imbalanced data, apply regularization, tune hyperparameters, and use advanced ensemble learning methods like Bagging, Boosting (AdaBoost, XGBoost, LightGBM) to improve model accuracy and robustness. By the end of this module, you’ll be able to build more accurate and generalizable models for real-world datasets.
0/5
Module 10: Neural Networks & Deep Learning (Basics)
This module introduces the fundamentals of Neural Networks and Deep Learning. You will learn about neurons, perceptrons, activation functions, forward and backward propagation, and get hands-on experience with TensorFlow/Keras to build a simple neural network. By the end of this module, you’ll understand how deep learning models process data and make predictions, laying the foundation for advanced neural network architectures.
0/5
Module 11: Working with Real-World Data
This module focuses on applying data science and machine learning concepts to real-world datasets. You will explore datasets from Kaggle and UCI, and complete hands-on projects including regression (house prices), classification (Titanic survival), and clustering (customer segmentation). By the end of this module, you’ll gain practical experience in handling, analyzing, and modeling real-world data, preparing you for professional data science tasks.
0/4
Module 12: Model Deployment (Basics)
This module introduces the basics of deploying machine learning models so that they can be used in real-world applications. You will learn how to save trained models, and deploy them using Flask or Streamlit for interactive web-based applications. By the end of this module, you’ll understand how to make your ML models accessible and usable beyond local environments.
0/4
Module 13: Ethics & Future of Data Science
This module focuses on the ethical, social, and professional aspects of data science and machine learning. You will learn about data privacy, security, bias, fairness, and explainable AI (XAI). The module also provides guidance on career paths, skills, and opportunities in the data science field. By the end of this module, you’ll understand the responsible and ethical use of data and be aware of future trends and career growth.
0/4
Data Science & Machine Learning – Final Assessment
Test your knowledge and skills from all modules of this course. This assessment evaluates your understanding of Python, data handling, ML algorithms, model deployment, and ethical AI practices.
0/1
Data Science and Machine Learning Basics

Lesson 11.3: Project 2 – Titanic Survival Prediction (Classification)


🔹 Objective

Predict whether a passenger survived the Titanic disaster using features like age, sex, class, and fare.

  • Practice data cleaning, feature engineering, classification modeling, and evaluation.


🔹 Steps to Build the Project

  1. Load Dataset

 
import pandas as pd
data = pd.read_csv('titanic.csv')
data.head()
  1. Understand Dataset

  • Check columns, missing values, data types.

 
data.info()
data.describe()
data.isnull().sum()
  1. Preprocess Data

  • Handle missing values → Fill Age, drop Cabin if too many missing.

  • Encode categorical variables → Sex, Embarked using One-Hot Encoding.

  • Feature scaling → Not always needed for tree-based models.

  1. Split Dataset

 
from sklearn.model_selection import train_test_split

X = data.drop('Survived', axis=1)
y = data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

  1. Build Classification Model

  • Example: Logistic Regression

 
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
  1. Evaluate Model

  • Metrics: Accuracy, Precision, Recall, F1-score, Confusion Matrix

 
from sklearn.metrics import accuracy_score, classification_report, confusion_matrix

y_pred = model.predict(X_test)
print("Accuracy:", accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))

  1. Optional Improvements

  • Try Random Forest, XGBoost, or SVM for better performance.

  • Feature engineering: create FamilySize, Title from names.


🔹 Key Learnings

  • Classification predicts categorical outcomes.

  • Feature engineering improves model accuracy.

  • Multiple evaluation metrics help understand model performance.


Quick Recap:

  • Task → Predict Titanic survival (classification).

  • Steps → Load → Clean → Encode → Split → Train → Evaluate.

  • Improve → Try advanced classifiers, feature engineering.

Scroll to Top