Daniel Berry

Data Scientist


Phone (314) 288-9457
LinkedIn danielkberry


Chicago, Illinois US


Allstate Insurance Company

Junior Data Scientist

01/17 — Current

  • Analyzed data and built predictive models using a variety of data sources from embedded devices and smartphones for vehicle telematics applications.
  • Implemented a system in Apache Spark for data ETL and user-specific predictive model building, enabling Allstate to build a different predictive model (GBM/Random Forest) for each of hundreds of thousands of users in minutes.

Laboratory for Interdisciplinary Statistical Analysis, Virginia Tech

Lead Statistical Consultant

08/15 — 12/16

  • Provided statistical consulting services for researchers at Virginia Tech including data manipulation (data transformation and statistical programming), data visualization, and data analysis (model building and interpretation). Lead a team of associate collaborators in implementing solutions.
  • Consulted for clients in a variety of departments including business, engineering, architecture, biomaterials, genetics, agriculture, and osteopathic medicine.

Allstate Insurance Company

Data Science Intern

05/16 — 08/16

  • Developed a model to detect car crashes using data from vehicle embedded devices.
  • Aggregated data from a variety of sources and resolutions such as accelerometer readings, GPS trails, and vehicle OBD2 port data using Python, Apache Spark, and Hive.
  • Extracted features using a variety of Python toolkits (e.g. Numpy, Scipy, Pandas, Statsmodels).
  • Trained gradient boosted decision tree ensemble (GBM) models using XGBoost and scikit-learn.
  • Demonstrated 50-fold accuracy improvement over original crash detection logic.

Allstate Insurance Company

Data Science Intern

05/15 — 08/15

  • Developed a GBM model to predict losses from auto accidents using R.
  • Prototyped a natural language feature extraction system in Python to mine information from the notes of accident claims.
  • Demonstrated utility of system by comparing to existing models and showing increased predictive accuracy.
  • Assessed the utility of a business metric for quantifying agent success through visualizations and a Generalized Linear Model based predictive model (implemented in R).
  • Presented results to management to inform future decisions.


Data Science Intern

05/14 — 08/14

  • Implemented a system for visualizing high-dimensional data (from academic literature on topological data analysis) using the R and Python programming languages.
  • Prototyped a graphical user interface for the system using R’s Shiny framework.
  • Demonstrated the utility of the prototype system by visualizing publically available datasets.


Virginia Polytechnic Institute and State University

08/15 — 12/16
Statistics, Masters (3.9 GPA)

University of Alabama

01/14 — 05/15
Applied Statistics, Masters (4.0 GPA)

University of Alabama

08/11 — 05/15
Mathematics, Bachelors (3.5 GPA)



Python, R, MATLAB, and SAS

Data Management

Hadoop/HDFS, SQL, Hive, and Apache Spark

Machine Learning/Modeling

Generalized Linear Models, Random Forests, Gradient Boosting, Generalized Linear Mixed Models, and Clustering


Detection of cigarette smoke inhalations from respiratory signals using decision tree ensembles

Published in Proc. 2015 IEEE SoutheastCon

Clustering technical documents by stylistic features for authorship analysis

Published in Proc. 2015 IEEE SoutheastCon


Boy Scouts of America

Eagle Scout