Welcome to History of Data Science. Discover the stories of heroes who transformed our daily lives!

BROUGHT TO YOU BY Dataiku Dataiku

xperiences-ico Xperiences
Machine Learning / Deep ML Dataiku Favorite

Gradient Boosting Algorithms: Busting Bias Error

4 min read
Since their origins in the 1980s, powerful gradient boosting algorithms have been reducing bias error in machine learning (ML) models. Often applied in the field of learning to rank, today, they span applications from Yahoo search results to high energy physics.

As one of the most powerful algorithms in the field of ML, gradient boosting is used to reduce bias error in learning models. It combines various simple models with limited performance levels — like weak models or weak learners — into a single composite one. Or, as American computer scientist Michael Kearns said in 1988, making “an efficient algorithm for converting relatively poor hypotheses into very good hypotheses.”

A Boost For ML

The origins of gradient boosting are found in the observation by statistician Leo Breiman that “Boosting can be interpreted as an optimization algorithm on a suitable cost function.” The first truly successful realization of boosting was Adaptive Boosting (AdaBoost). This was formulated by Yoav Freund and Robert Schapire in 1995 and helped improve the performance of other learning algorithms. In 1997, AdaBoost was recast in a statistical framework by Leo Breiman as ARCing algorithms. These were further developed by Jerome H. Friedman, resulting in his seminal paper “Greedy Function Approximation: A Gradient Boosting Machine.” These were the first explicit regression gradient boosting algorithms and would later be referred to simply as gradient boosting or gradient tree boosting.

“An efficient algorithm for converting relatively poor hypotheses into very good hypotheses.” – Michael Kearns

Gradient boosting is extremely useful in the field of Learning to rank, or machine-learned ranking (MLR). It lies at the heart of the ML ranking engines used by Yahoo and Yandex. It also plays a role in data analysis for high energy physics and tackling challenges linked to the discovery of Higgs Boson. It helped separate the signal from background noises using data from the Large Hadron Collider (LHC), the world’s largest, highest-energy particle collider.