Welcome to History of Data Science. Discover the stories of heroes who transformed our daily lives!

BROUGHT TO YOU BY Dataiku Dataiku

xperiences-ico Xperiences
Trevor Hastie: A Statistical Linguist
Machine Learning / Classic ML

Trevor Hastie: A Statistical Linguist

4 min read
Trevor Hastie’s name doesn’t frequently pop up in the mainstream media, but if you work in computer science or machine-learning, the chances are you’ve read something he’s written or benefited from one of his discoveries.

A curvy invention

Born and raised in South Africa, Hastie moved to the U.S. to attend grad school at Stanford. He started shaking up the stats world with his Ph.D. dissertation, which included a groundbreaking invention: Principal Curves and Surfaces. Principal curves, he explained, were “smooth one-dimensional curves that pass through the middle of a dimensional dataset.”

Why is this relevant? Simply put, straight lines aren’t the best way to summarize certain datasets that have multiple variables in play. Principal curves have helped data scientists more accurately represent and interpret complex datasets.

In the nearly four decades since their invention, principal curves have been used for a variety of purposes. They have helped enhance machine learning solutions for image processing, speech and handwriting recognition, and analyzing large datasets.

“When I was at school, we were the nerds. Now we are sexy! Wish I could be back at school.”

Trevor Hastie in 2011, responding to a comment by economist Hal Varian

From S to R

Hastie has made important contributions to the development of statistical programming languages. In 1991 he published Statistical Models in S, which built on the groundbreaking language that his mentors, John Chambers and Richard Becker, had developed in 1976 for statistical modeling and visualization.

In subsequent years, Hastie helped develop the successor R language. A free and open source software, R is overseen by the R Foundation for Statistical Computing, a nonprofit group. While it is not the most popular programming language, it is widely embraced by professional number crunchers. In addition to being the go-to language for statisticians seeking to build models or compelling visualizations, R is frequently used for statistical modeling in finance, healthcare, and manufacturing.

Next Article