US HouseHold Income Data Cleaning
- Raymond Kadzashie
- Sep 14, 2024
- 1 min read
Updated: Sep 15, 2024
This document provides a detailed explanation of the SQL operations performed on the us_project.us_household_income and us_project.us_household_income_statistics tables.
Link to Code:
BACKGROUND: Received raw household data from a client and needed to transfer and clean the data to be used in a Web Application.
PROCESS: Used MySQL to ingest the data, identified data inconsistencies, and normalized the data using processes shown below.
First, lets take a look at the data:
We need to check for duplicates first. Let's do this by running a count on the id which should be unique:
SELECT id, COUNT(id)
FROM ushouseholdincome
GROUP BY id
HAVING COUNT(id)> 1;
With this code we can see we have multiple duplicates:
We need to remove these duplicates and we do that with this code:
Now we have removed all the duplicates in our data.
We Updated Records #Update state names where there are discrepancies in the case:
Comments