Welcome to History of Data Science. Discover the stories of heroes who transformed our daily lives!

BROUGHT TO YOU BY Dataiku Dataiku

xperiences-ico Xperiences
Filter
Date
Families
Mathematics / Deep ML Dataiku Favorite

K-Nearest Neighbors Algorithm: Classification and Regression Star

4 min read
08_27_2021
The K-Nearest Neighbors algorithm (or kNN) can be used to solve both classification and regression problems. Simple and easy-to-implement, this supervised machine learning algorithm is one of the most widely used, from genetics to Netflix recommendations.

The kNN algorithm classifies a data point according to its neighbors’ classification. It assumes that similar things exist in proximity, hinging on this assumption being true enough for the algorithm to be useful.

The origins of kNN can be found in research carried out for the U.S. armed forces. Evelyn Fix (1904–1965) was a mathematician and statistician who taught statistics at Berkeley. Joseph Lawson Hodges Jr. (1922–2000) was a fellow statistician at Berkeley who worked with the Twentieth Air Force of USAF (United States Air Forces) from 1944. Putting their brilliant minds together, they wrote a technical analysis report in 1951 for USAF. It introduced discriminant analysis, a non-parametric classification method. However, the paper was never officially published — most likely for confidentiality reasons in the wake of World War II.

Championing Classification

This is the first known introduction of non-parametric classification and would later become the famous kNN algorithm. In 1967, Thomas Cover and Peter Hart proved an upper bound error rate with multiclass kNN classifications. Since then, there have been many improvements and new approaches continue to emerge.

“Birds of a feather flock together.”

Today, the kNN algorithm is often used as a benchmark for more complex classifiers, like the Support Vector Machines (SVM) and the Artificial Neural Networks (ANN). Despite its simplicity, kNN fares better than many more powerful classifiers. It proves useful in areas ranging from genetics to data compression, and economic forecasting. The kNN algorithm is also used in video recognition, image recognition, handwriting detection, and speech recognition — as well as the customer reading or watching recommendations made by companies like Amazon and Netflix.