Overview

The purpose of this project was to analyze past data and utilize machine learning to predict the salaries of Major League Baseball players fo a give year. With data very accessible and professional sports analyzing and critiquing ever statistic to get the most out of the athletes, a new space has opened the doors to making smart financial decisions through the use of data and technology. In 2004, we saw the birth of "Moneyball" or sabermetrics a strategy created by general manager Billy Beane and the front office of the Oakland Athletics. This was a data-focused method of finding players that would be able to perform above their monetary value. This strategy is still used today and has been of great value, especially to small market teams. However, this is only the beggining and the next step would be to merge data science/machine learning with the sports world; and it has already begun.

The use of analytics is not only beneficial on the field but has been increasingly prevelant in the recent years. As salaries continue to increase (see figure below) due to salary caps and cost of living adjustments, many teams will want to seek out players as Billy Beane did in 2004, evaluting their value on the field compared with the financial impact of the club.

The Sean Lahman Baseball database was used in this project. His Baseball Archive web site was one of the earliest sources for baseball information on the Internet, and he headed the first significant effort to make a database of baseball statistics freely available to the general public. Lahman also contributed to pioneering efforts at websites like Baseball-Reference.com. Python libraries such as Pandas, Numpy, sklearn and, Tensorflow were used in the data processing of this project. After I trained two models with a large set of data of a variety of players, I trained a model that would be more specific to predicting one certain player's salary. That player is Ronald Acuña Jr. and currently he is one of the brightest yound stars in the Major League. After his first two seasons, his statistics rival those of the 430 million dollar man, Mike Trout.