Load / Parse Data:

In order to interactively visualize the Houston 2015 311 data in Tableau, the raw data is imported into R for processing.

setwd("./data-jam-april-2016")
df <-read.delim("311-Public-Data-Extract-2015-tab-mhhi.txt", sep="\t", as.is=T)
df_wm <-read.delim("311-Public-Data-Extract-2015-swm-tab-mhhi.txt", sep="\t", as.is=T)

df<-rbind(df, df_wm)
rm(df_wm)

Response Efficiency

We also investigated “response efficiency” to 311 requests, defined as the time period between an open (SR.CREATE.DATE) and closed (DATE.CLOSED) request. This data was then written into a tab-delimited text file that could be imported into Tableau for mapping.

str(df)
## 'data.frame':    344107 obs. of  23 variables:
##  $ CASE.NUMBER        : chr  "101001613212" "11839105-101001613213" "11839106-101001613214" "11839107-101001613215" ...
##  $ SR.LOCATION        : chr  "15815 BITRIDGE, HOUSTON TX 77053" "7523 JASON, HOUSTON TX 77074" "7523 JASON, HOUSTON TX 77074" "2700 DALTON, HOUSTON TX 77017" ...
##  $ COUNTY             : chr  "FORT BEND" "HARRIS" "HARRIS" "HARRIS" ...
##  $ DISTRICT           : chr  "K" "J" "J" "I" ...
##  $ NEIGHBORHOOD       : chr  "FORT BEND / HOUSTON" "SHARPSTOWN" "SHARPSTOWN" "PARK PLACE" ...
##  $ TAX.ID             : num  6.25e+15 9.24e+11 9.24e+11 5.52e+11 NA ...
##  $ MANAGEMENT.DISTRICT: chr  "" "Sharpstown MD" "Sharpstown MD" "" ...
##  $ DEPARTMENT         : chr  "PWE Public Works Engineering" "PWE Public Works Engineering" "PWE Public Works Engineering" "PWE Public Works Engineering" ...
##  $ DIVISION           : chr  "PU Public Utilities" "PU Public Utilities" "PU Public Utilities" "PU Public Utilities" ...
##  $ SR.TYPE            : chr  "Water Leak" "Water Leak" "Water Service" "Fire Hydrant" ...
##  $ QUEUE              : chr  "PU_Water" "PU_Water" "PU_Water" "PU_FireHydrant" ...
##  $ SLA                : int  22 22 22 22 60 22 22 3 15 22 ...
##  $ STATUS             : chr  "Closed" "Closed" "Closed" "Closed" ...
##  $ SR.CREATE.DATE     : chr  "2015-01-01T01:03:21Z" "2015-01-01T01:41:49Z" "2015-01-01T02:13:17Z" "2015-01-01T02:26:34Z" ...
##  $ DUE.DATE           : chr  "2015-01-23T01:03:21Z" "2015-01-23T01:41:49Z" "2015-01-23T02:13:17Z" "2015-01-23T02:26:34Z" ...
##  $ DATE.CLOSED        : chr  "2015-01-01T10:07:08Z" "2015-01-06T16:45:50Z" "2015-01-01T05:53:49Z" "2015-01-02T15:17:52Z" ...
##  $ OVERDUE            : num  -21.6 -16.4 -21.9 -20.5 -58.6 ...
##  $ Title              : chr  "Water Leak-101001613212" "Water Leak-101001613213" "Water Service-101001613214" "Fire Hydrant-101001613215" ...
##  $ LATITUDE           : num  29.6 29.7 29.7 29.7 29.7 ...
##  $ LONGITUDE          : num  -95.5 -95.5 -95.5 -95.3 -95.5 ...
##  $ Channel.Type       : chr  "Voice In" "Voice In" "Voice In" "Voice In" ...
##  $ IDX                : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ Median_HHI         : int  46979 45833 45833 31400 104960 135125 135125 62778 62778 47083 ...
require(lubridate)
## Loading required package: lubridate
df$DATE.CLOSED <-as.Date(df$DATE.CLOSED)
df$SR.CREATE.DATE<-as.Date(df$SR.CREATE.DATE)
df$norm_time_diff <-as.numeric(df$DATE.CLOSED - df$SR.CREATE.DATE)

write.table(df, file = "Houston_April_DataJam_Tableau_INPUT.txt", quote=T, row.names=F, sep="\t")

Normalized Response Time by Department

After calculating response time to 311 requests, we wanted to see how variable the average response time was for different departments. We used the SummaryBy function to find the median response time across all 311 reuqests by department.

require(doBy)
## Loading required package: doBy
## Loading required package: survival
by.dept <-summaryBy(norm_time_diff~DEPARTMENT, data = df, FUN = function(x) median(x, na.rm = T))
colnames(by.dept)<-c("DEPARTMENT", "NORMALIZED_RESPONSE_TIME")
require(ggplot2)
## Loading required package: ggplot2
ggplot(by.dept, aes(x = DEPARTMENT, y = NORMALIZED_RESPONSE_TIME))+geom_bar(stat="identity")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## Warning: Removed 1 rows containing missing values (position_stack).

We quickly learned that this approach grossly simplifies the ability to score “department efficiency” by equating all response types. We altered our approach to focus on the top request types (based on count). Since the top request was “Water Leak”, we decided to investigate requests that dealt with flooding (SR.TYPE == "Flooding"), particularly around the Memorial Day Flood time period.

Water Leak 311 Request Locations

common_address <-as.data.frame(table(df$SR.LOCATION))

After further inspection, it was evident that several request locations showed up much more frequently than others. We found that 42.2596054 % of locations submitted more than 1 311 request, 20.6268355 % submitted more than 2, and 4.5069032 % submitted more than 5.

To get more detail about this multiple 311 requests, we determined the top 5 request types for locations that had between 2 and 5, 10 and 50, or more than 50 311 requests in 2015.

df.a <-as.character(common_address[which(common_address$Freq > 2 & common_address$Freq <= 5),1])
df.b <-as.character(common_address[which(common_address$Freq > 10 & common_address$Freq <= 50),1])
df.c <-as.character(common_address[which(common_address$Freq > 50),1])

common_1 <-df[which(df$SR.LOCATION %in% df.a), "SR.TYPE"]
common_1.a <-as.data.frame(table(common_1)); common_1.a <-common_1.a[order(common_1.a$Freq),1]; 
common_1.a<-as.character(common_1.a[1:5])

common_2 <-df[which(df$SR.LOCATION %in% df.b), "SR.TYPE"]
common_2.a <-as.data.frame(table(common_2)); common_2.a <-common_2.a[order(common_2.a$Freq),1]; 
common_2.a<-as.character(common_2.a[1:5])

common_3 <-df[which(df$SR.LOCATION %in% df.c), "SR.TYPE"]
common_3.a <-as.data.frame(table(common_3)); common_3.a <-common_3.a[order(common_3.a$Freq),1]; 
common_3.a<-as.character(common_3.a[1:5])

The top 5 request types for locations with between 2 and 5 311 requests in 2015 were: Heat Emergency, Icing Roadway, Paint Stripes, Private Leak Violation, Sign Code Violation.

The top 5 request types for locations with between 10 and 50 311 requests in 2015 were: Boarding Homes, Building Break In, Complaint Form Request, Employee Complaint, Heat Emergency.

The top 5 request types for locations with more than 50 311 requests in 2015 were: Boarding Homes, Clean Pavilion or Restroom, Crisis Cleanup, Dumpster Permit, House Move Route.