Daniel Berry

Data Scientist

Contact

Phone (314) 288-9457
LinkedIn danielkberry

Location


Chicago, Illinois US

Work

Allstate Insurance Company

Junior Data Scientist

01/17 — Current

  • Analyzed data and built predictive models using a variety of data sources from embedded devices and smartphones for vehicle telematics applications.
  • Implemented a system in Apache Spark for data ETL and user-specific predictive model building, enabling Allstate to build a different predictive model (GBM/Random Forest) for each of hundreds of thousands of users in minutes.

Laboratory for Interdisciplinary Statistical Analysis, Virginia Tech

Lead Statistical Consultant

08/15 — 12/16

  • Provided statistical consulting services for researchers at Virginia Tech including data manipulation (data transformation and statistical programming), data visualization, and data analysis (model building and interpretation). Lead a team of associate collaborators in implementing solutions.
  • Consulted for clients in a variety of departments including business, engineering, architecture, biomaterials, genetics, agriculture, and osteopathic medicine.

Allstate Insurance Company

Data Science Intern

05/16 — 08/16

  • Developed a model to detect car crashes using data from vehicle embedded devices.
  • Aggregated data from a variety of sources and resolutions such as accelerometer readings, GPS trails, and vehicle OBD2 port data using Python, Apache Spark, and Hive.
  • Extracted features using a variety of Python toolkits (e.g. Numpy, Scipy, Pandas, Statsmodels).
  • Trained gradient boosted decision tree ensemble (GBM) models using XGBoost and scikit-learn.
  • Demonstrated 50-fold accuracy improvement over original crash detection logic.

Allstate Insurance Company

Data Science Intern

05/15 — 08/15

  • Developed a GBM model to predict losses from auto accidents using R.
  • Prototyped a natural language feature extraction system in Python to mine information from the notes of accident claims.
  • Demonstrated utility of system by comparing to existing models and showing increased predictive accuracy.
  • Assessed the utility of a business metric for quantifying agent success through visualizations and a Generalized Linear Model based predictive model (implemented in R).
  • Presented results to management to inform future decisions.

MRIGlobal

Data Science Intern

05/14 — 08/14

  • Implemented a system for visualizing high-dimensional data (from academic literature on topological data analysis) using the R and Python programming languages.
  • Prototyped a graphical user interface for the system using R’s Shiny framework.
  • Demonstrated the utility of the prototype system by visualizing publically available datasets.

Education

Virginia Polytechnic Institute and State University

08/15 — 12/16
Statistics, Masters (3.9 GPA)

University of Alabama

01/14 — 05/15
Applied Statistics, Masters (4.0 GPA)

University of Alabama

08/11 — 05/15
Mathematics, Bachelors (3.5 GPA)

Skills

Programming

Python, R, MATLAB, and SAS

Data Management

Hadoop/HDFS, SQL, Hive, and Apache Spark

Machine Learning/Modeling

Generalized Linear Models, Random Forests, Gradient Boosting, Generalized Linear Mixed Models, and Clustering

Publications

Detection of cigarette smoke inhalations from respiratory signals using decision tree ensembles

04/2015
Published in Proc. 2015 IEEE SoutheastCon

Clustering technical documents by stylistic features for authorship analysis

04/2015
Published in Proc. 2015 IEEE SoutheastCon

Volunteer

Boy Scouts of America

Eagle Scout