Big Data for tennis results prediction
Historical tennis match data is widely available on the internet.
Official tournament websites, such as www.atpworldtour.com, provide information on players and match results as well as athlete performance for each match.
Data for tennis
Some sources, e.g. www.tennis-data.co.uk or Best Betting Sites in India , provide historical data in a structured form (CSV or Excel files). Paid databases are also available - more comprehensive, for longer periods and with better accuracy, e.g. OnCourt.
For match modelling data such as statistics on sets and points for each player may be important. This data can be obtained by parsing websites such as flashscore.com. It is important to note that HawkEye's ball tracking technology for many tournaments can provide higher quality and more detailed data such as ball and player positions at any point in the match. However, the ATP, which owns this data, does not licence its use to third parties.
There are two main categories of tennis betting: pre-match and live bets, which differ in their odds. Besides that, it is possible to bet not only on the winner of the match, but also on many other factors, such as the score in individual sets, the total number of games, etc. Most predictive models focus on pre-match bets on the winner of the match, as this is the type of betting where the most historical odds data is available, which allows for the most complete evaluation of the predictive model's performance.
Bets on tennis matches can be placed either in betting shops (online and offline) or on betting exchanges. Traditional bookmakers (e.g. Pinnacle) set the odds for the different outcomes of the match and the customer (bettor) plays against the bookmaker. In the case of betting exchanges (e.g. Betfair), customers can bet against odds set by other bettors. The exchange equalises customers' bets and earns a commission on each bet played.
Odds, implied probability and ROI
The odds for a bet refer to the profit the bettor will make if the outcome of an event is correctly predicted. For example, if a bettor correctly predicts a player to win whose odds are 3.00, the bettor will receive $2 for every dollar paid (in addition to the refundable value of the bet). If the bettor's prediction turns out to be wrong, he only loses his bet amount irrespective of the odds. There are different systems of odds, the most popular of which are decimal or European (1.5, 2.00, 2.50, etc.) and fractional or British (1/2, 1/1, 6/4, etc.).
The odds express the estimated probability of the outcome of the match, i.e. the bookmaker's estimate of the true probability. In the example described above with odds 3.00 (1 to 3) the estimated probability p of a player winning the match is 33%.
The return on investment (ROI) is the return on investment for a given period of time. In the case of sports betting, ROI is the percentage of winnings from each bet placed, averaged over a distance.
Measuring a model's performance based on ROI calculated from historical betting market data is a common approach in betting research. If the model accuracy (percentage of correct predictions) is chosen as a target value, then by trivial filtering of matches at low odds (1.01-1.3) we can get close to 90% accuracy or more, but for obvious reasons, ROI will be negative in this case.
Knowing the odds and the perceived probability of the outcome of the match, you can make different decisions about how much to bet and whether to bet at all. Obviously, different strategies result in different ROI. Generally, three basic strategies are used to evaluate the effectiveness of a predictive model. Let
si = betting size per player i
pibettor is the bettor's estimate of the probability of player i winning
bi = net odds for betting on player i, calculated as x-1 for decimal odds or as x/y for fractional odds.
piimplied is the estimated probability of player i winning, calculated as (1/x)*100% for decimal x, or as y/(y+x) for fractional x/y.