top of page

US HouseHold Income Data Cleaning

  • Writer: Raymond Kadzashie
    Raymond Kadzashie
  • Sep 14, 2024
  • 1 min read

Updated: Sep 15, 2024

This document provides a detailed explanation of the SQL operations performed on the us_project.us_household_income and us_project.us_household_income_statistics tables.


Link to Code:


BACKGROUND: Received raw household data from a client and needed to transfer and clean the data to be used in a Web Application.


PROCESS: Used MySQL to ingest the data, identified data inconsistencies, and normalized the data using processes shown below.


First, lets take a look at the data:

ree

We need to check for duplicates first. Let's do this by running a count on the id which should be unique:

SELECT id, COUNT(id)
FROM ushouseholdincome
GROUP BY id
HAVING COUNT(id)> 1;

With this code we can see we have multiple duplicates:

ree












We need to remove these duplicates and we do that with this code:

ree










Now we have removed all the duplicates in our data.


We Updated Records #Update state names where there are discrepancies in the case:

ree









 
 
 

Comments


bottom of page