GABBERT - Machine Learning on the Gridiron

In the spirt of Nate Silver's PECOTA and CARMELO, GABBERT (the General Algorithm for Bettering Basic Evaluation and Resigning Tactics) is a predictive model that examines the first three years of an NFL player's career and predicts their performance in future years.

Meaningful football analytics are notoriously difficult to capture and make actionable. GABBERT seeks to provide insight into player development to give teams more information about which of their players are most likely to break out.

This project requires the development of positional metrics that are used to evaluate players as well as set targets for predictive modeling. CATCHERR is an example of a metric created for this project.

CATCHERR does a great job separating out top-level talent in the NFL. This graph highlighting the top rookie seasons of the past 17 years shows a nice example of that. Once CATCHERR was set as a target, a predictive model was trained on every wide receiver season since 1999, tuned with a goal of optimizing toward picking out top-level talent, and used to predict the performance of wide receivers who will be entering their fourth year in the league in 2016. Deandre Hopkins tops this class of wide receivers, and the model identifies him as most likely to have an all-pro season. This model will be updated as more data becomes available. A more detailed write-up of the results of this project can be found here.

LincolNLP

Abraham Lincoln is probably the most quoted writer in American history, but little study has been done on the entire corpus of his collected writings. Using the tools of Natural Language Processing, we can gain insights into his writing style and study how it changed over the course of his political career.

This project is ongoing. Track its progress here.

Each point on this map of Chicago indicates a mosquito trap, the size of the dot indicates the amount of West Nile positive mosquitos found in each trap.

Predicting West Nile

Several years ago in Chicago, there was an outbreak of West Nile Virus. As part of the city's comprehensive plan to combat this outbreak, the Department of Public Health sponsored a Kaggle competition to see who could best predict where and when West Nile might occur. During my time at General Assembly, we were challenged to complete this project over the course of a week. For a project of this size, that's a pretty quick turnaround. Working with a teammate, we were able to build a predictive model with a high degree of accuracy. The success of this model was based largely on our implementation of a Voting Classifier that combined predicted probabilities for several different models (Extra Trees, Logistic Regression, etc). More details about our process and results can be found at the project's repo.

Predicting Kobe Bryant

I decided to take on another Kaggle competition as a way of practicing and implementing some different machine learning techniques. This particular competition presented entrants with 25,000 of Kobe Bryant's career shots and asked them to predict the outcome of 5,000 other shots. The data provided for the competition was feature-rich, so a lot of the model-building was about feature selection and feature extraction. My submission in this competition had a log-loss score of 0.61, just 6 percentage points off the lead score. Read more here.