US HouseHold Income Data Cleaning

Raymond Kadzashie
Sep 14, 2024
1 min read

Updated: Sep 15, 2024

This document provides a detailed explanation of the SQL operations performed on the us_project.us_household_income and us_project.us_household_income_statistics tables.

Link to Code:

https://github.com/rkwasi123/us-housing/blob/main/US%20Household%20Income%20Data%20Cleaning.sql

BACKGROUND: Received raw household data from a client and needed to transfer and clean the data to be used in a Web Application.

PROCESS: Used MySQL to ingest the data, identified data inconsistencies, and normalized the data using processes shown below.

First, lets take a look at the data:

We need to check for duplicates first. Let's do this by running a count on the id which should be unique:

SELECT id, COUNT(id)
FROM ushouseholdincome
GROUP BY id
HAVING COUNT(id)> 1;

With this code we can see we have multiple duplicates:

We need to remove these duplicates and we do that with this code:

Now we have removed all the duplicates in our data.

We Updated Records #Update state names where there are discrepancies in the case:

US HouseHold Income Data Cleaning

Recent Posts

Comments