Contact
Location
Chicago, Illinois US
Work
Allstate Insurance Company
Junior Data Scientist
01/17
— Current
- Analyzed data and built predictive models using a variety of data sources from embedded devices and smartphones for vehicle telematics applications.
- Implemented a system in Apache Spark for data ETL and user-specific predictive model building, enabling Allstate to build a different predictive model (GBM/Random Forest) for each of hundreds of thousands of users in minutes.
Laboratory for Interdisciplinary Statistical Analysis, Virginia Tech
Lead Statistical Consultant
08/15
— 12/16
- Provided statistical consulting services for researchers at Virginia Tech including data manipulation (data transformation and statistical programming), data visualization, and data analysis (model building and interpretation). Lead a team of associate collaborators in implementing solutions.
- Consulted for clients in a variety of departments including business, engineering, architecture, biomaterials, genetics, agriculture, and osteopathic medicine.
Allstate Insurance Company
Data Science Intern
05/16
— 08/16
- Developed a model to detect car crashes using data from vehicle embedded devices.
- Aggregated data from a variety of sources and resolutions such as accelerometer readings, GPS trails, and vehicle OBD2 port data using Python, Apache Spark, and Hive.
- Extracted features using a variety of Python toolkits (e.g. Numpy, Scipy, Pandas, Statsmodels).
- Trained gradient boosted decision tree ensemble (GBM) models using XGBoost and scikit-learn.
- Demonstrated 50-fold accuracy improvement over original crash detection logic.
Allstate Insurance Company
Data Science Intern
05/15
— 08/15
- Developed a GBM model to predict losses from auto accidents using R.
- Prototyped a natural language feature extraction system in Python to mine information from the notes of accident claims.
- Demonstrated utility of system by comparing to existing models and showing increased predictive accuracy.
- Assessed the utility of a business metric for quantifying agent success through visualizations and a Generalized Linear Model based predictive model (implemented in R).
- Presented results to management to inform future decisions.
MRIGlobal
Data Science Intern
05/14
— 08/14
- Implemented a system for visualizing high-dimensional data (from academic literature on topological data analysis) using the R and Python programming languages.
- Prototyped a graphical user interface for the system using R’s Shiny framework.
- Demonstrated the utility of the prototype system by visualizing publically available datasets.
Education
Virginia Polytechnic Institute and State University
08/15 — 12/16Statistics, Masters (3.9 GPA)
University of Alabama
01/14 — 05/15Applied Statistics, Masters (4.0 GPA)
University of Alabama
08/11 — 05/15Mathematics, Bachelors (3.5 GPA)
Skills
Programming
Python, R, MATLAB, and SASData Management
Hadoop/HDFS, SQL, Hive, and Apache SparkMachine Learning/Modeling
Generalized Linear Models, Random Forests, Gradient Boosting, Generalized Linear Mixed Models, and ClusteringPublications
Detection of cigarette smoke inhalations from respiratory signals using decision tree ensembles
04/2015
Published in Proc. 2015 IEEE SoutheastCon
Clustering technical documents by stylistic features for authorship analysis
04/2015