Using Data to Identify Safe Drivers

Always looking for new ways to use data to create solutions, the McCann Minneapolis Analytics team recently participated in a competition hosted by Kaggle.  The competition, sponsored by State Farm, challenged participants to use machine-learning techniques to determine if a driver is distracted or not. 

To determine whether a driver is distracted, the algorithm was to be built using images of drivers – around 23,000 in total. The sample images fall into one of ten different categories. One category shows safe drivers, while the other nine show some form of distracted driving – whether texting, tuning the radio, etc. The challenge asked participants to use this categorical data to create an algorithm that would classify new images into one of the ten categories. Kaggle then tested the solutions for accuracy.

To develop an algorithm, the team first considered how to adjust images so that the computer could “see” them. Each image needed to be converted into data.  Our solution was to create a 32x32 matrix of pixels - a grayscale value associated to each.  An example of this adjustment is shown below:

Safe Driver Image

Safe Driver Image

Adjusted Safe Driver Image

Adjusted Safe Driver Image

We then “unrolled” the matrices into a single vector of 1,024 grayscale values, and designated a classification (safe driving, texting, tuning radio, etc.).  At this point, we had a data set we could use to build an algorithm.  After experimenting with several different approaches, we eventually submitted a solution that classified new images with 95%+ accuracy.

This type of technology is something everyone encounters on a daily basis.  Ever deposited a check via an ATM?  Very similar technology is used to scan the information on the check.  Our participation in this exciting competition was to strengthen our ability to classify images with associated outcomes.  We hope this capability will help us optimize ad effectiveness by identifying which images are most likely to drive desired consumer actions.