Seattle_data_analysis

This Repo is about EDA & Data Visualization.

View the Project on GitHub iamswati/Seattle_data_analysis

PREDICT THE POWER CONSUMPTION OF BUILDINGS View on Website

Values Skills: Data pre-processing, descriptive statistics, Python

Skills: Regression methods, Prediction methods.

TASK

You work for the city of Seattle. To achieve its goal of a carbon-neutral city in 2050, your team is taking a close interest in emissions from non-residential buildings. For this, careful records were made by your agents in 2015 and 2016.However, these surveys are expensive to obtain, and from those already done, you want to try to predict the emissions of buildings whoseemissionshave not yet been measured.Two measures interest you: CO2 emissions and total energy consumption. You also want to evaluate the interest in the emission prediction of the ENERGYSTAR Score(which is complicated to calculate)with the approach currently used by your team.

# In Python, 3 environment comes with many helpful analytics libraries installed

import numpy as np        # linear algebra
import os                 # accessing directory structure
import pandas as pd       # data processing, CSV file I/O (e.g. pd.read_csv)

# Pandas uses the plot() method to create diagrams.
# Pythons uses Pyplot, a submodule of the Matplotlib library to visualize the diagram on the screen.
# Matplotlib is a low level graph plotting library in python that serves as a visualization utility
import matplotlib.pyplot as plt

# Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
import seaborn as sns
%matplotlib inline

Readind 2015-building-energy-benchmarking.csv file

Drop un-necessary columns

df_2015.drop(['PropertyName', 'TaxParcelIdentificationNumber', 'CouncilDistrictCode', 'Neighborhood', 'DataYear', 'ListOfAllPropertyUseTypes', 'LargestPropertyUseType','LargestPropertyUseTypeGFA', 'SecondLargestPropertyUseType', 'SecondLargestPropertyUseTypeGFA', 'ThirdLargestPropertyUseType', 'ThirdLargestPropertyUseTypeGFA', 'YearsENERGYSTARCertified', 'Comment', 'Outlier', '2010 Census Tracts', 'City Council Districts', 'DefaultData', 'ComplianceStatus', 'Seattle Police Department Micro Community Policing Plan Areas', 'SPD Beats', 'Zip Codes'], axis=1, inplace=True)

# The head() method returns the headers and a specified number of rows, starting from the top.
df_2015.head()

Correlation

HeatMap 2015

Histogram

Histogram

output

output

output

output

Reading 2016-building-energy-benchmarking.csv file

Drop un-necessary columns

df_2016.drop(['OSEBuildingID', 'DataYear', 'BuildingType', 'PrimaryPropertyType', 'PropertyName', 'Address', 'City', 'State', 'ZipCode', 'TaxParcelIdentificationNumber', 'ListOfAllPropertyUseTypes', 'LargestPropertyUseType', 'LargestPropertyUseTypeGFA', 'ENERGYSTARScore', 'Neighborhood', 'CouncilDistrictCode', 'SecondLargestPropertyUseType', 'SecondLargestPropertyUseTypeGFA', 'ThirdLargestPropertyUseType', 'ThirdLargestPropertyUseTypeGFA', 'YearsENERGYSTARCertified', 'Comments', 'Outlier', 'DefaultData', 'ComplianceStatus'], axis=1, inplace=True)

# The head() method returns the headers and a specified number of rows, starting from the top.
df_2016.head()

Correlation

HeatMap 2016

Histogram

Histogram

output

output

output