Welcome to History of Data Science. Discover the stories of heroes who transformed our daily lives!

BROUGHT TO YOU BY Dataiku Dataiku

xperiences-ico Xperiences
Leo Breiman: Statistics at the Service of Others
Applications / Applied Data Science Dataiku Favorite

Leo Breiman: Statistics at the Service of Others

4 min read
American statistician Leo Breiman (1928-2005) was a man of action. He didn’t just want to develop theories, but also apply them to the real world. Whether working for UNESCO in Liberia, as an industry consultant or university professor, he used statistics to change everyday life for the better and take data in new directions.

Beyond university

Leo embraced academia. After a PhD in mathematics, he taught probability theory at the University of California, Los Angeles (UCLA). Yet, he quickly realized that he “wasn’t cut out to be an abstract mathematician.” Always keen to help, he hosted poor Mexicans learning English, volunteered to teach mathematics to emotionally disturbed youngsters and later became President of the Santa Monica School District Board. During a sabbatical as an educational statistician for UNESCO, he trekked through the rainforests of Liberia to count the number of schools and pupils.

Resigning his professorship in 1968, he started working as a consultant for the US government and industry. Drawing on his mathematical background, he developed statistical methods to predict patterns in everything from traffic to pollution.

“One problem in the field of statistics has been that everyone wants to be a theorist.”

In the right place

Fascinated by the newly emerging area of machine learning (which relies heavily on classification), he understood the important role algorithms would play in the future of statistics. During this time, he helped craft and test techniques for classification including the famous “Classification and Regression Trees” (CART), setting some of the essential groundwork for data mining and machine learning.

Representing decisions, describing data and predicting its value, CART is the basis of numerous modern decision tree concepts from cost-complexity to surrogate splinters. Upping the stability, accuracy and scale of tree ensembles over the years, he is also associated with key concepts like bagging and Random Forests — using a large number of individual decision trees to enable machine learning algorithms to learn faster and make predictions.

Embracing the computer

In 1980, he was ready to jump back into university life and accepted a position in the statistics department of the University of California, Berkeley. Focusing on applying statistics to computer science, he transformed the Statistical Laboratory — that had just one small computer — into one of the most sophisticated computing facilities in the country.

Throughout his life, his practical application of statistics has, above all, bridging the gap between statistics and computer science. And, paved the way for advances in ML and data mining.