Lesson 2.6 – Data Cleaning Tools in Excel
Lesson Overview
Data cleaning is essential before analyzing data to ensure accuracy and consistency. Excel provides several tools to clean, organize, and prepare data, making it ready for analysis and reporting.
1. Remove Duplicates
-
Purpose: Eliminate repeated entries in your dataset to maintain unique records.
-
How to Use:
-
Select the range or table.
-
Go to Data → Remove Duplicates.
-
Select the columns to check for duplicates.
-
Click OK.
-
Example: Removing duplicate email addresses from a contact list.
2. Text to Columns
-
Purpose: Split data in one column into multiple columns based on a delimiter.
-
How to Use:
-
Select the column to split.
-
Go to Data → Text to Columns.
-
Choose Delimited (comma, space, semicolon) or Fixed width.
-
Follow the wizard to finish.
-
Example: Splitting “John Doe” into First Name and Last Name.
3. Find & Replace
-
Purpose: Quickly search for specific data and replace it with new values.
-
How to Use:
-
Press Ctrl + F for Find or Ctrl + H for Replace.
-
Enter the value to find and the replacement value.
-
Click Replace All (or Replace one by one).
-
Example: Replace all occurrences of “N/A” with “0”.
4. Flash Fill
-
Purpose: Automatically fill a column based on a pattern recognized from your input.
-
How to Use:
-
Start typing the desired output next to your data.
-
Excel will suggest remaining entries.
-
Press Enter or go to Data → Flash Fill.
-
Example: Extract first names from a full name column or format phone numbers consistently.
5. Tips for Data Cleaning
-
Always make a copy of the raw data before cleaning.
-
Combine multiple tools for efficient cleaning.
-
Use Trim function to remove extra spaces:
=TRIM(A1) -
Check for hidden characters using
LENfunction.
6. Real-Life Examples
-
Removing duplicate customer IDs before creating a report.
-
Splitting full addresses into Street, City, State.
-
Replacing inconsistent spelling of product names.
-
Formatting phone numbers in a standard format.
Practice Questions
-
How do you remove duplicate entries from a dataset?
-
When should you use Text to Columns?
-
What is the difference between Find & Replace and Flash Fill?
-
How can the TRIM function help in cleaning data?
-
Give an example where Flash Fill can save time in a dataset.
