Data Visualization Final Project Portfolio
Hot Subway Cars in NYC
Begin

Introduction

New York City (NYC) is the most populous city in the US and has been described as the cultural and financial capital of the world. As a metropolitan city, the public transportation plays a very important role for the convenience of the city's dwellers and visitors. The subway is considered as the favorite public transportation. However, during summer time where most people would visit the city, there has been a major issue occurred frequently over years, which is the hot cars problem. The hot car problem is caused by the air-conditioning breakdowns in the subway cars which happened about 10 times a day during June, July and August between 2010 and 2014, according to analysis of Metropolitan Transportation Authority data obtained by The New York World through open records request. In total, there were nearly 6,500 “hot cars” over the five-year period covered by the data. Unsurprisingly, the vast majority of those incidents occurred during the hottest months of the year. This problem brings a very bad experience for the users, as they described that it was a horrendous to be stuck on an underground hot train.

Overview

The MTA’s Division of Car Equipment performs daily pre-service inspections of the HVAC (heating, ventilation and air conditioning) units on each of the subway’s 6,300 subway cars. Air-conditioning breakdowns happened about 10 times a day during June, July and August between 2010 and 2014, according to analysis of Metropolitan Transportation Authority data obtained by The New York World through open records request. In total, there were nearly 6,500 “hot cars” over the five-year period covered by the data. Unsurprisingly, the vast majority of those incidents occurred during the hottest months of the year. Notice that 2013 has the highest peak of hotcars problem on July while year 2014 has the lowest peak on July. This does make sense because in fact, 2013 was reporited to have an extreme hot weather and 2014 is the opposite.

For Passengers

The MTA has complete data of the hot car problem such as incident time, repair time, subway route and car model, but there is no any complete guidance yet for users about this problem. So in this page, we are trying to provide a comprehensive visualization for passengers. We predict which line has the most critical hot car issue so that users can avoid it if possible. Following two visualizations, one is a NYC subway map, another is a heatmap representing relationship between subway lines and weekdays. If user wants to get data for a particular line, mouse over lines or line labels in the subway map, associated data in heatmap will be highlighted. At the same time, mouse will change from arrow into hand, that helps users to easily follow what's going on. What is more, if users want to get information about a particular weekday, heatmap is intuitive. The possibilities of taking a hot car are encoded as color. The redder, the higher possibilities of taking a hot car. if color of two rectangles are similar, user can check actual possibilities in linearGradient. From the heatmap below, we can notice that line 3 has the most critical hot car issue, so if passengers have candidate lines, take candidates.

For MTA: Hot Car Frequency Trend

Information about car type, average in-service age and its in-service lines, it is shown in the left table. "Number of cars" means how many cars of a certain car type are in service. It is also shown in the parallel coordinates chart below as "TotalNumber".

In parallel coordinates chart, we visualized "Hot Car" frequency count within 5 years (shown in the axes from "Year:2010" to "Year":2014). As the range of "TotalNumber" varies a lot, we can't simply focus on count values. So we calculated the ratios by using count divided by its "TotalNumber". "AvgMRatio" stands for average ratio for each car type in a certain month within 5 years. "AvgRatio" is the average AvgMRatio within all these months. By using brush to select lines in the chart below, you can either check the frenquencies and ratios of each car type, or view the trend in a certain month.

Since our topic is highly related with hot temperature, we only care about the "hot cars" happening during summer. So we choose the dataset with a time range from June to September within these 5 years. The results are unsurprising but to some extent counter-intuitive. In general, the older cars tend to have more "hot issues" than new cars. The top 1 worst performance car is R32,whose in-sevice age is the oldest, running on C line. The car with least air conditioner breakdowns is R188, and is also the youngest, serving on line 7. The interesting part is age doesn't always positively related with performance: R142A only working a dozen years, running on line 4 and 6, ranks as the worst top 5.

For MTA: Defect with Repairment

As it is not always true that the older cars have worse performan, meaning age is not always positively related with "Hot Car" issues, we go further to visualize the relationship between reasons of the hot car problems(dirty, leaking, H/W damaged, malfunctioning, others), repairing method(reset, repair, replace) and "Hot Car" frequency trend. This time, we no longer focus on the frequency counting values, what we really care about is the percetage. That is as being "Hot Cars", how many are caused by malfunctioning. Also, in this "malfunction group", how many need to be replaced or reset. In the chart below, the inner ring represents different in-service age groups, the middle ring is the manufactuer, the outer ring shows car type. We use age group as the inner ring instead of manufactuer is because after 1980, MTA no longer purchase new cars from four companies(Budd, St. Louis, Pullman, westing house). Also, MTA only introduced only one type of car from each of the four companies. So the relationship between these four companies and car type is simply fixed. In this sense, we prefer to conducting analysis on age groups.

Click any section on the sunburst chart and see further information displayed in a parallel set chart.

 

For MTA: Conclusion

No matter in which age group, malfunctioning is always the major reason to cause hot car issue. However, the percetage of "malfunctioning" is not straight up from the youngest age group to the oldest. The trend is more like a parabola. That is the percetage of "malfunctioning" goes up when the in-service age getting larger, and reach the peak at the group [10,15]. Then it keeps decreasing and reaches the lowest at group [45,50]. Meanwhile, the range of defects reason becomes more diverse when the in-service age is larger than 25. Also, the percetage for each defect becomes more evenly when the age becoming larger. Regarding to repairing method, for younger car types, replacement and reset are nearly half and half. With age increasing, replacement becomes dominant.

In general, the older cars have more "Hot Car" issues, and their defects become more diverse. But still malfunctioning is the major reason of causing hot car issue. Moreover, the major repairing method of older cars is replacement. As last, we take a look at the special R142A again. More than 45% of R142A's hot cars are being reset to get fixed. While for other car types, the major repairing method is replacement. In this sense, we assume that though R142A's avgRatio is abnormally high, nearly half of its hot cars can be fixed by a relatively low-cost method: resetting, compared to replacement. This may probably because as being a young car, most of R142A's problems are pretty small and easy to be repaired.

In conclusion, considering the in-service age, avgRatio, defects and repairment, R46, R68A, R160, R143 are car types with high performance, which we highly recommend.

Our Team

This project is designed and developed for the final project of CS573 Data Visualization (Fall 2015) at Worcester Polytechnic Institute by:

Yanpu Li

Data Science '16 - WPI

Azharuddin Priyotomo

Data Science '16 - WPI

Huan Ye

Data Science '16 - WPI

Resources

Below are our resources for creating this project:

Dataset

MTA Dataset

Codes References

Reusable D3 scripts, snippets, etc..

Website Design

Bootstrap theme