How Nebraska Voted in 2016

Below is a visualization of how Nebraskans voted in the 2016 presidential election. Here are a few key points:

  • The data is based on how many more votes were counted for Trump versus Clinton, the two front-runners
    • Red and yellow areas are ones where significantly more votes were Republican
    • Green areas are where a few thousand more people in that county voted Republican
    • Blue areas were much closer — ones where Trump prevailed by less than 1,000 votes
    • Dark blue and purple areas were very, very close
    • Two counties appear gray — these are ones in which Clinton received a higher number of votes
  • Clinton won by a relatively small margin in Nebraska’s two largest counties, Douglas and Lancaster, where Omaha and Lincoln reside, respectively
  • Trump won by a rather large margin in the counties immediately surrounding Douglas and Lancaster, especially Sarpy in which he prevailed by over 15,000 votes (that’s roughly three times more than the number of votes Clinton won by in Douglas and Lancaster combined)
  • Smaller and more Western counties typically votes strongly Republican

Resources
data: NYT Election Results
county names map: https://www.digital-topo-maps.com/county-map/nebraska.shtml
state election results: http://www.sos.ne.gov/elec/2016/pdf/2016-canvass-book.pdf

How to Create a Nebraska Map in ggplot2 by County

In this post, I show you how to create the outline for a Nebraska map in ggplot2 that is separated by counties. First, R already has latitudes, longitudes and mapping necessities we need for this, it’s just a matter of accessing them, which is one of the reasons why R is so great and easy to use.

Using maps_data from the maps package, we can turn all of these coordinates into a data frame. Let’s name ours ‘states’, but then also create a smaller data frame that only contains Nebraska data:

states <- map_data("state")
ne_coords <- subset(states, region=="nebraska")
head(ne_coords)

Now, let’s do the same thing, but with counties…

counties <- map_data("county")
ne_county <- subset(counties, region=="nebraska")
head(ne_county)

Okay! We have all of the data we need to draw the map. Let’s activate ggplot2 and see what this looks like when we visualize it.

library(ggplot2)

Nebraska map without county lines:

ne_map <- ggplot(data=ne_coords, mapping = aes(x=long, y=lat, group=group)) + coord_fixed(1.3)
+ geom_polygon(color="black", fill="gray")


Nebraska map with county lines:

ne_map + theme_clasic() + geom_polygon(data = ne_county, fill=NA, color="white")


And that’s it. Of course, you’ll want to get some useful data and combine it into this map for a useful data visualization, but this should get you started.

Predicting Tesla’s Federal EV Tax Credit in R (and why I’m probably wrong)

First, some context: the federal government provides a tax credit for electric vehicle manufacturers. For details, see here, but in short the full tax credit ($7,500) is available for consumers now through the quarter after the manufacturer sells and delivers 200,000 vehicles.

So, for example, if Tesla delivers its 200,000th vehicle on June 1, 2018 (Q2), then the full credit is available the remainder of that quarter plus one more full quarter (Q3 in this example). After that, the tax credit is cut in half for two additional quarters, and the cut in half again for two final quarters before the credit ends completely for the manufacturer.

Using R, I forecasted Tesla vehicle deliveries in the United States and then plotted it with ggplot2. Here’s how:

Thankfully I found a great site called InsideEVs that has collected or closely projected the quarterly US deliveries with their Plug-In Sales Scorecard. The historical data was not in crawlable tables (images instead), so instead of scraping the data I painstakingly copied everything over to Excel, but it was worth the effort since I knew I would not need to repeat this task.

After adding up models by month I ended up with this (displaying a condensed version here since there were about 100 rows):

Month Year Quarter Deliveries Cumulative
Jan 2012 Q1 0 0
Feb 2012 Q1 0 0
April 2018 Q2 6150 183810
May 2018 Q2 9220 193030

Here is a link to the data if you would like to use it: Estimated Tesla Deliveries by Month Worksheet

I will ultimately need the data in quarters, but decided to forecast using months as the period because it gives me more data points and can more likely detect patters that way, like seasonality. I used the forecast function in R in order to predict future deliveries based on previous months.

Assuming you have read the data into your IDE, I started by viewing my data in R to make sure it looked right and then activated the tseries and graphics packages I knew I would need. I will be turning my data into a time series in R so that it can be understood by the forecast function later on. It’s more or less adding meta data to your data table. If you’re working with monthly data, your frequency should be set to “12.” Before doing that, however, I knew I would not be needing the “Year” variable since this will be understood when I transform it into a time series in R, so I easily removed that entire column by setting it to NULL.

View(teslas)
library(tseries)
library(graphics)
library(forecast)
teslats$Year <- NULL
View(teslats)
teslats <- ts(teslas$`Cars Delivered`, start = c(2012, 1), end = c(2018, 5), frequency = 12)
View(teslats)

Next, I plotted the data — just to get an idea of what it looked like on a graph. Good way to familiarize yourself with your data in a more visual way, since looking at just the numbers is difficult to discern.

plot(teslats)

There is a handy function in R that performs a seasonal decomposition of your data set. Let’s do that now and see what it shows us. We will call it “teslafit” so it doesn’t overwrite our other variables.

teslafit <- stl(teslats, s.window = "period")
plot(teslafit)

If you remove the seasonality, it is pretty clear that there is still a strong positive trend. I’m not sure what information to take away from the seasonality, other than there appears to be a spike in the last month of each quarter and that is basically the pattern you are seeing in the graph above.

It’s important to get familiar with your data if you are really going to understand and forecast it. Two other useful ways to observe your data is with the monthplot and seasonplot, which both stem from the forecast package. The seasonplot is essentially a year by year view of your data over a calendar year x-axis. They are both worth a gander, even though they are not necessary.

monthplot(teslats)
seasonplot(teslats)

Okay, now it’s time to forecast. We are going to predict the next 19 months (through end of 2019) by adding our original time series data into the forecast function and then plot it out to see what we get.

forecast(teslafit, 19)
plot(forecast(teslafit, 19))

The blue line is the middle of the forecast range, and what I’m interested in. It looks reasonable, and this is just for fun, so I’ll call it good. If this were a more important project, we would want to look at the accuracy of the forecast, try multiple forecasting methods, and compare them. To do this, you should become familiar with some of the common metrics to measure forecast accuracy like MAE and MASE.

accuracy(teslafit)

Being a Tesla fan (and future Model 3 owner), I am aware of some outside factors that would impact this prediction. For one, it’s well documented that Elon Musk (of Tesla) wants to ramp up production of their Model 3 car in a significant way — and they even have a target of 5,000 per week, which I could have used in some way but did not given their history of missing targets.

A second factor is related to the possibility of Tesla strategically maximizing the federal tax credit for electric vehicles. My prediction (if you add up deliveries cumulatively by quarter) says they will reach the 200k threshold sometime in June. But it’s quite possible Tesla will hold back on U.S. deliveries so they instead reach the mark in July, which is a brand new quarter and thus would give more users an opportunity to take advantage of the federal tax credit program.

With that said, I continued on with my prediction knowing it was plausible. So the next thing I wanted to do is plot the quarterly data using ggplot2 in a way that the different tax credit phases were clearly displayed on top of the forecast line. Here is the end result:

In order to accomplish this, I used arguments in ggplot2 that allow you to shade different regions of the site and also added some text with specification on where the text should sit. Please note, I am using quarterly — not monthly — data now and you can get this in the Google Sheet I linked to above as a short cut. I named the data set “quarterly” and added a few additional columns in order to build the final product. Feel free to leave questions in the comments.

teslaplot <- ggplot(quarters) + geom_rect(aes(xmin='2018Q1', xmax='2018Q4', ymin=-Inf, ymax=Inf), fill="aquamarine", alpha=0.03)
+ geom_rect(aes(xmin='2018Q4', xmax='2019Q2', ymin=-Inf, ymax=Inf), fill="blue", alpha=0.03)
+ geom_rect(aes(xmin='2019Q2', xmax='2019Q4', ymin=-Inf, ymax=Inf), fill="darkmagenta", alpha=0.03)
+ geom_line(aes(Quarter, quarters$`Cumulative Sales`, group=1))
+ labs(title='Tesla Federal Tax Credit Projection', x='Quarter', y='Vehicles Delivered') + scale_y_continuous(labels = comma)
+ geom_text(aes(x='2018Q3', y=200000, label='$7,500'))
+ geom_text(aes(x='2019Q1', y=200000, label='$3,750')) + geom_text(aes(x='2019Q3', y=200000, label='$1,875'))

How to Create a US Heatmap in R

Creating a simple US map in R can be done in a number of ways. Two popular packages for this type of project are ggplot2 and plotly. In this case, I used plotly.

The data for my map is a list of US state codes (NE, IL, MA, CA, etc.). A second variable gives a count of how many players the Nebraska football team is targeting in each state. In order to follow my example with your own data, you will need to have the state code variable and some numeric variable to map it against.

Once you have your data in a table and are ready to use it, create the following styling options for the map, which we will apply later:

mapDetails <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)

As you may have guessed, “scope” determines the type of map, in this case a map of the USA. We will also determine here what to do with lakes and how to color them.

usaMap <- plot_geo(X2018targets, locationmode = 'USA-states') %>%
add_trace(
z = X2018targets$Targets, locations = X2018targets

Creating a simple US map in R can be done in a number of ways. Two popular packages for this type of project are ggplot2 and plotly. In this case, I used plotly.

The data for my map is a list of US state codes (NE, IL, MA, CA, etc.). A second variable gives a count of how many players the Nebraska football team is targeting in each state. In order to follow my example with your own data, you will need to have the state code variable and some numeric variable to map it against.

Once you have your data in a table and are ready to use it, create the following styling options for the map, which we will apply later:

mapDetails <- list(
scope = 'usa',
projection = list(type = 'albers usa'),
showlakes = TRUE,
lakecolor = toRGB('white')
)

As you may have guessed, “scope” determines the type of map, in this case a map of the USA. We will also determine here what to do with lakes and how to color them. State Code`, color = X2018targets$Targets, colors = ‘Blues’ ) %>% colorbar(title = “Targets”) %>% layout( title = ‘2018 Nebraska Football Targets by State (February 2018)’, geo = mapDetails )

The code above connects my data to the map and allows me to modify text within the plot area. My data frame is called “X2018targets,” so you’ll need to replace this with your data frame name. You’ll also need to set “z” to your numeric data and “locations” to your state code variable.

When you’re finished, simply type “usaMap” and hit enter to see your plot appear (I use R Studio, by the way, assuming you likely do as well). If you have any trouble or questions, let me know in the comments.

Data Science Course Recommendation: Udemy Data Science A-Z

I want to give some props to a course I recently took online at Udemy.com. The course is called Data Science A-Z and is taught by someone by the name of Kirill Eremenko.

First, I just want to stress that I am not being paid for this endorsement in any way. Just want to share my review with you all.

The price was right at a mere $10. Not sure if that was a short-term promotional price or how long it will last, but it’s well worth it — even as a refresher.

There are three sections: data visualization with Tableau, Statistics/Modeling, and Data Preparation. The sections are not dependent on each other and can be taken in any order, which adds a nice element of flexibility to the whole thing.

As you probably know, there are countless courses out there but what I appreciate about this one is that it was easy to digest if you have any sort of background in these areas and it explains not only how to approach these disciplines but why you are doing them at all.

During the course, I was also introduced to a great free statistical program called Gretl. You can download it here. If you have used SPSS or SAS, you’ll pick it up in no time at all.

Find out more here: https://www.udemy.com/datascience/

I also really like Data Camp, but there is a monthly fee associated with membership. I believe it’s somewhere between $20-30/month.

Thanks for reading.

Some Basic Thoughts on Data Visualization

Data visualization is the art and science of clearly communicating data in a way that is easily digestible to the end user. It goes without saying that there is so much data available now. But effectively making sense of that data is critically important. That’s when data really becomes useful information.

Excel can do some things. You are all likely familiar with their built-in line graphs, pie charts and other visualizations. But it is limited in the amount of data you can work with and the customization of visuals. For many scenarios it just fine.

But there are times when one might have more data to work with or perhaps does not have a great way to get data to Excel in the first place. There are tools out there like Tableau that can connect to APIs (or you can import the data) and have a rich library of visualizations and ways to customize them. Tableau in particular has a free “Public” version available however the output will be placed on their website for anyone to see. If you are a business or agency that wants to keep your data private, a paid version is out there as well.

Recently Google announced their version of Tableau — Data Studio. There is a free version here as well and I think you can choose to keep your data private. However, at least as of June 2016, it only connects to other Google platforms. If you want to pull other data into the user interface, it is possible, but you first need to get it into Google Sheets or Big Query, which is a Google SQL database more or less. Big Query is its own topic for another day.

I have only scratched the surface on what is out there. For R users, you can download and work with libraries like ggplot2; QlikView is another Tableau-esque tool; and on and on. Check out this website for a few more examples, which are cleverly broken out tools for developers and non-developers: http://thenextweb.com/dd/2015/04/21/the-14-best-data-visualization-tools/#gref.