Lesson 4.6: Hands-on EDA Project (Titanic Dataset Example)
Objective:
Apply EDA techniques on the Titanic dataset to understand passenger survival patterns.
1. About the Titanic Dataset
-
Contains details of passengers like:
-
Age, Gender, Ticket Class, Fare, Cabin, Embarked (Port), Survival (Yes/No).
-
-
Goal → Find what factors influenced survival chances.
2. Step-by-Step EDA Process
(a) Load Dataset
(b) Check Data Info & Missing Values
(c) Descriptive Statistics
-
Mean age, median fare, survival percentage, etc.
(d) Visualizations
-
Survival Count
-
Survival by Gender
-
Survival by Passenger Class (Pclass)
-
Age Distribution
(e) Correlation Heatmap
3. Key Insights (Example Findings)
-
Females had a much higher survival rate than males.
-
Passengers in 1st class survived more compared to 3rd class.
-
Younger passengers (children) had better survival chances.
-
Higher ticket fares linked with higher survival probability.
4. Conclusion
Through EDA, we found that gender, passenger class, and fare strongly affected survival.
This process helps in feature selection for machine learning models.
