Lesson 3.1: Understanding Data – Structured vs Unstructured
Data is the foundation of Data Science. Before applying any machine learning or analysis techniques, it is important to understand what type of data you are working with. Broadly, data is divided into two categories:
1. Structured Data
-
Data that is organized in rows and columns (like in Excel or SQL databases).
-
Easy to store, search, and analyze.
-
Examples:
-
Sales records (Product ID, Price, Quantity, Date).
-
Student database (Name, Roll Number, Marks).
-
-
Mostly numerical or categorical values.
2. Unstructured Data
-
Data that does not have a predefined format.
-
Difficult to store and analyze directly.
-
Examples:
-
Text documents (emails, articles, reviews).
-
Images, audio, and video files.
-
Social media posts, chat messages.
-
3. Semi-Structured Data (in between)
-
Data that is not in tabular form but still has some structure.
-
Examples:
-
JSON, XML files.
-
Log files.
-
✅ Summary:
-
Structured data → Easy to analyze, tabular form.
-
Unstructured data → Text, images, videos, free format.
-
Semi-structured data → JSON, XML, logs.
👉 Understanding these data types is the first step in preprocessing, because the techniques for cleaning and preparing them are different.
