The Leviathan model produces a regularized adjusted plus-minus rating for individual football (soccer) players. Specifically, the model estimates each individual player’s impact on expected goal rates, controlling for the impacts of their teammates, their opponents, their manager, their opponent’s manager, the score, the venue, and the number of players on the pitch. The basic premise of the Leviathan model is as follows: while only individual players, the shooter and the goalkeeper, are responsible for a shot resulting in a goal during a match, all of the players on the shooter’s team and all of the players on the opposition team work to generate or suppress that shot. The Leviathan model, therefore, isolates each individual player’s impact on expected goal generation, both for and against their team.
The raw ratings are used to separately evaluate each player’s offensive and defensive contributions. In addition, the estimates are adapted into an expected points above replacement (xPAR) model in order to understand how each player’s impact on expected goal creation and suppression affects match outcomes.
In team sports, such as football, evaluating the performance and contribution of individual players has significant impact on recruitment decisions and match outcomes. To that end, plus-minus ratings have been employed, with a range of complexity, to holistically evaluate the impact of each player by distributing the performance of a team across the team’s individual players. The basic premise of all plus-minus ratings systems is that all of the players on a given team work to generate scoring chances for their team and suppress scoring chances for the opposition team; while there are instances where the creation of a certain scoring chance is largely an individual effort, it is important to acknowledge that every other player on the pitch, either directly or indirectly, is in some way contributing to the creation of that chance.
The most rudimentary plus-minus models simply count the number of goals scored minus goals conceded by a given player’s team when the player is on the pitch. Given that these basic plus-minus models do not control for the impacts of a player’s teammates or opponents, an “adjusted” plus-minus metric was constructed by using linear regression to control for the impacts of the other players on the pitch, both teammates and opponents. In football, though, it is relatively common for two players on a given team to play an almost identical set of minutes, meaning that both players almost always play together. As a result, a basic ordinary least squares-derived adjusted plus-minus model fails to distinguish between the impacts of two players with a largely identical set of minutes played, a problem known as multi-collinearity.
The Macro-Football Leviathan model, therefore, produces a regularized adjusted plus-minus model by employing a ridge regression, where the regression coefficients are biased through the use of three different penalty terms, a historical performance penalty, a minutes played penalty, and a normal ridge regression penalty, to minimize the problem of multi-collinearity and to reduce rating estimate variance. Further, instead of using goals as the regularized adjusted plus-minus dependent variable, the Leviathan model uses expected goals (xG). In football matches, goals are a relatively rare event and, as a result, are not always reflective of a given team’s performance during a match. Shots, on the other hand, are a significantly more common occurrence, although not all shots are equally valuable. Therefore, using expected goals provides a two-pronged solution by assigning to each shot a measure that quantifies the probability that the given shot results in a goal, thereby creating a dependent variable that is more reflective of a team’s overall match performance, while, simultaneously, distinguishing between the value of different shots. Finally, the Leviathan model, rather than simply controlling for the other players on the pitch during a match, also controls for the managers of each team, the score, the number of players on the pitch, and the venue (home or away).
Currently, the bulk of publicly available football player analysis deals with “micro stats,” which are statistics that function to quantify the observable events that occur during a match, allowing for comparability between individual players. Micro stats can be extremely useful in capturing how effective a certain player is with regard to specific events. For example, if it seems like a certain player is particularly good at intercepting opposition passes, micro statistics, such as possession-adjusted interceptions per 90, can illustrate exactly how frequently that player intercepts opposition passes and how that compares to other players. The Leviathan model, though, produces a macro statistic, the “isolated impact” of a given player, which strictly evaluates an individual player’s effect on the two essential aims of football: the object of offense is to generate scoring chances and the object of defense is to suppress opposition scoring chances. In other words, the Leviathan model does not measure how a particular player affects play; it does not matter if a particular player completes 100% of their passes or 0% of passes. The only thing that the Leviathan model measures is how an individual player contributes to their team’s chance creation and opposition chance suppression.
I collected match and shot data from 5 European leagues over the past 6 seasons as listed in Table 1. In addition, I collect data from the same 5 European leagues throughout the current 2020/21 season on a continual basis, though the summary statistics pertaining to that season are absent from Table 1 due to its ongoing nature. For each match in the data set, I collect the starting line-ups of each team, the timings and names of any substitutions, the managers of each team, the timings of any red cards, the venue in relation to each team (ie. home for the home team and away for the away team), the timings of any goals, and the timings and attributes of all shots, including shots that are both on and off target, taken during the match. For each shot in the data set, I collect the timing of the shot, the pitch coordinates where the shot was taken from, the type of shot (open play shot taken with feet, open play header, direct free-kick, from corner-kick, or penalty), the player taking the shot, the goalkeeper facing the shot, the action preceding the shot (such as a rebound, take on, cross, or pass), and the result of the shot (goal or no goal).
|England Premier League||2014/15 - 2019/20||2,280||57,868|
|Germany Bundesliga||2014/15 - 2019/20||1,836||47,723|
|Spain La Liga||2014/15 - 2019/20||2,280||54,341|
|Italy Serie A||2014/15 - 2019/20||2,280||61,353|
|France Ligue 1||2014/15 - 2019/20||2,179||52,332|
Using all of the shots taken in a given season, such as the 2014-15 EPL season, as inputs, I estimate the probability that a given shot will result in a goal with a generalized ridge logistic regression model. For each distinct season in the data set, I use a design matrix \(X\) for all shots taken with a player’s foot and a design matrix \(Z\) for all shots taken with a player’s head where each row is an individual shot with the following columns (I use two separate design matrices \(X\) and \(Z\) based off of the assumption that certain players may be above average finishers with their feet, but may be below average finishers with their head, and vice versa):
Table 2 presents a summary of probability scoring methods for the expected goals metric derived from the generalized ridge logistic regression model, noted as the “Macro-Football” model, for shots taken during the 2018/19 season in the top 5 European leagues. In addition, Table 2 presents the same series of probability scoring methods for the expected goals model from Understat.com for the identical set of shots taken during the 2018/19 season. According to all three statistics listed in Table 2, the “Macro-Football” model performs marginally worse than the “Understat” model. Despite that underperformance, though, there are several reasons why I still employ the “Macro-Football” model in the regularized adjusted plus-minus model, rather than the “Understat” model. First, the difference in accuracy between the two models is relatively negligible. Second, given the lack of an available methodology behind the “Understat” model, we have no way of knowing either the inputs for their expected goals model or how they arrive at their outputs. While, on the surface, this may not seem to be an important factor, it ultimately means that they could be selectively overfitting certain shots in order to improve their model’s performance (please note that this is purely conjecture and I do not believe that “Understat” are doing anything nefarious). The third, and final, reason why the “Macro-Football” model is preferred to the “Understat” model is that, given that we do not have any understanding of Understat’s fitting or modeling procedure for their xG metric, the purpose of this model is explicitly not to perfectly predict goals; performance of any expected goals metric could be improved by simply using goals. The purpose of the expected goals model, in this case, though is to accurately reflect the relative value of each shot taken during a football match. Because I can adjust the parameters and modeling procedure of the “Macro-Football” model, I am confident that there are no “thumb on the scale” or overfitting adjustments to improve model performance.
|Model||Log Loss||Brier Score||AUC|
While the exact coefficients from a logistic regression can be difficult to interpret, the signs of the coefficients are informative and much easier to understand. In Figure 1, for example, we can see that as the shooting location gets increasingly close to goal, the likelihood increases that a shot results in a goal. In addition, the sample of xG model covariates from the expected goal model for shots taken with feet for the 2018/19 season in the top 5 European leagues, Table 3, provides information on how situations and actions preceding a shot can impact the expected goal probability of the given shot. For instance, a foot shot directly resulting from a corner-kick is less likely to result in a goal than a foot shot from a set piece or a foot shot from open play. In addition, a foot shot resulting from a cross is much less likely to result in a goal than a foot shot resulting from a rebound or through ball.
|Covariate||Impact on Goal Probibility|
|Situation: Open Play||0.0%|
|Situation: Set Piece||-15.8%|
|Last Action: Cross||-46.9%|
|Last Action: Interception||-11.1%|
|Last Action: Rebound||31.9%|
|Last Action: Through Ball||39.9%|
The Leviathan model is based on a simple linear model, \(Y \sim wX\beta\), where \(Y\) is a vector of observations, \(w\) is a weighting matrix, \(X\) is a matrix of regressors, and \(\beta\) is a vector of marginals, which are the impacts associated to each covariate in the matrix of regressors, \(X\), independent of each of the other covariates. Every uninterrupted segment of play, where an “uninterrupted segment” is a passage of play where the personnel on the field, the score, and the number of players on the pitch remain constant, are encoded in the model as two rows; one of the rows is encoded with the home team as the attacking team, where the corresponding entry in \(Y\) is the expected goal rate of the home team. The other row is encoded with the away team as the attacking team, where the corresponding entry in \(Y\) is the expected goal rate of the away team. The Leviathan model fits this model with a generalized ridge regression.
The columns of matrix \(X\) are the features included in the Leviathan model.
Terms only applying to the offensive team:
Terms applying to both the offensive and defensive teams (these terms occur in pairs with one for offense and one for defense):
The entries in \(Y\) are the rates at which expected goals, as calculated by the Macro-Football xG model, are generated on the pitch. For each sequence of play, where a sequence of play is defined as an uninterrupted period of the match where the score, players, and red card situation remain unchanged, there are two entries in \(Y\) and two rows of \(X\).
For example, in a match where the personnel on the field, score, and red card situation remain constant from the first minute of the match until the 30th minute, where the away team scores a goal in the 30th minute, the first 30 minutes would be entered as one phase with two rows of \(X\) and two entries of \(Y\). In addition, the phase would be recorded in the weighting matrix \(w\) with a weight of 30, representing the length of the segment of play. The first row of \(X\) corresponding to the 30 minute phase of play will have all score terms set to 0, since the score is 0-0, all red card terms set to 0, since the match is 11 v 11, the away indicator variable set to 0, the attacking columns for the home team players and manager set to 1, and the defending columns for the away team players and manager set to 1, with all of the other manager and player columns set to 0. The corresponding entry in \(Y\) is the rate of xG generated per minute by the home team during the 30 minute phase, from which I subtract the league average xG per minute. The second row of \(X\) will, once again, have all score terms set to 0 and all red card terms set to 0, but will have the away indicator variable set to 1, the attacking columns for the away team players and manager set to 1, and the defending columns for the home team players and manager set to 1, with all other columns set to 0. The corresponding entry in \(Y\) is the rate of xG generated per minute by the away team relative to the league average xG per minute.
Controlling for score effects, red cards, location, managers, teammates, and opposition, allows the Leviathan model to generate estimates for individual player isolated impacts on xG creation and suppression.
As stated previously, rather than fitting the model \(Y \sim wX\beta\) with a simple ordinary least squares regression model, I opt to fit the model with a generalized ridge regression with several “ridge penalties” in order to minimize the adverse effects of multicollinearity present in the data: when a pair of players play with each other for a significant portion of time, the estimated coefficients of an OLS regression will be unstable and, as a result, unreliable. Ridge regression models, though, introduce varying amounts of bias to the model, which decrease the variance, or instability, of coefficient estimates by penalizing the estimates towards a certain value, typically zero, by a certain amount. The specific “ridge penalties” are as follows:
Historical Penalty: Each of the model coefficients, including each individual player’s estimated impact on offensive xG creation and defensive xG suppression, for a given season are biased towards their respective values from the previous season. Beginning in the 2014-2015 season, the first season where the requisite shot and match event data is available, the Leviathan model is fit successively for each season from 2014-2015 to present. Therefore, beginning with the 2014-2015 season, after every season, the model produces estimates of situation, player, and manager impacts. Rather than treating a player’s estimated ability during a given season independently from their estimated ability during another season, the model is based on the idea that an individual player’s ability changes slowly from season to season. Therefore, for each season, except for the first season where the model is fit, 2014-2015, every model coefficient is biased towards its value from the prior season. In the 2014-2015 season, every model coefficient is biased towards zero, due to lack of data availability for the preceding season.
Minutes Played Penalty: While using play segment length, in minutes, as weights in the ridge regression model causes estimates for players that maintain high or low xG rates to be of a greater magnitude than players that maintain identical xG rates for smaller periods of play, the model also uses an additional penalty term to diminish the magnitude of the estimated impact for player’s with lower minutes played. Theoretically, players with low quantities of minutes played should not have the largest estimated impact magnitudes in the dataset. In other words, the most impactful football players are not the players that play around 100 minutes a season. As a result, the model places a greater penalty on the estimates of players that play a smaller quantity of minutes per season. As the number of minutes played by a player increases, the magnitude of the minutes played penalty decreases.
I estimate the optimal “Normal Penalty” value through computation of generalized cross-validation error.
One of the limitations of using a generalized ridge regression model is that it does not naturally produce estimates for coefficient standard errors. As a result, I calculate bootstrap-produced 95% confidence interval estimates to provide a reasonable proxy for the variance of the Leviathan model estimates. It is imperative, though, to understand that these bootstrap-produced confidence intervals strictly represent the variance of the biased estimates, not the variance of the unbiased estimates. As noted previously, by using a generalized ridge regression model with a series of penalty terms, we are quite literally constraining each coefficient’s possible variance. Therefore, bootstrap-produced confidence intervals, in this case, only serve to quantify the possible variance of each of the coefficients given the penalty terms imposed by the model. The bootstrap-produced confidence intervals are certainly not meaningless in this scenario, but they do only reflect the variance of the estimates produced by the model, which are based off of the use of various penalty terms. These confidence interval estimates should not be taken as gospel regarding the variance of any unbiased estimates.
Table 4 presents the 2019-2020 model covariates that apply only to the attacking team: the location, score, and red card covariates. The first covariate, “Away,” indicates that away teams generate 0.153 xG less per 90 minutes than home teams, controlling for score, red cards, teammates, opposition players, and managers. The score effect covariates indicate that teams generate less xG per 90 minutes when the score is tied than at other score states, except for when teams are losing by 3 or more goals. When teams are losing by 3 or more goals they generate less xG per 90 than at other score states, likely because they have little chance of coming back in the match and, as a result, have given up. Conversely, teams that are already winning the match generate far more xG per 90, likely because teams that are already winning the match feel free to push on for more goals. Losing teams, as long as they aren’t being entirely outclassed, also generate more xG per 90 than at the tied score state, likely because they are attempting to even up the match score. The red card coefficients indicate that teams with a man advantage, unsurprisingly, generate far more xG per 90, while teams with a man disadvantage generate far less xG per 90 than teams without a man advantage or disadvantage.
In addition to the 2019-2020 model covariates that strictly apply to attacking teams, Table 4 also includes the estimated isolated impact for Manager Jurgen Klopp during the 2019-2020 season. The attacking coefficient indicates that Klopp’s managerial impact causes his team to generate 0.032 xG per 90 greater than average across the European top 5 leagues. The Jurgen Klopp defensive coefficient indicates that Klopp’s managerial impact causes the opposition team to generate 0.045 xG per 90 less than average. In other words, a player or manager’s positive attacking impact means that the individual causes their team to generate greater than average xG per 90, while a negative defensive impact means that the individual causes the opposition team to generate less than average xG per 90.
|Covariate||Impact on Team Expected Goals per 90|
|+3 or more goals||+1.038|
|-3 or more goals||-0.019|
|Jurgen Klopp attacking||+0.032|
|Jurgen Klopp defending||-0.045|
The following plot shows the distribution of observed, non-model produced, on-field results from the 2019-2020 seasons in the top 5 European leagues (EPL, La Liga, Bundesliga, Serie A, Ligue 1). For each player that played in the top 5 European leagues in the 2019-2020 seasons, the player’s “on-field results” are calculated as the xG created by the player’s team per 90 when the player is on the pitch and the xG created by the opposition team per 90 when the player is on the pitch. The density plot is formed from the collection of each player’s “on-field results,” scaled relative to league average. The red dot represents the sum of the values, weighted by the number of minutes played by each player.
Now, instead of looking at the raw “on-field results,” we can look at the model-produced isolated impact estimates. The overall distribution of isolated impacts is relatively normal. In addition, the weighted average of isolated impacts skews towards “good,” which, intuitively, makes sense because “good” players will should play more minutes than “bad” players.
Disaggregating the isolated impact density plot into different position groups allows us to understand the differences in impacts between positions. The starkest difference between the three position group density plots is that the forward’s weighted average offensive isolated impact is larger than both the defender’s and midfielder’s weighted average. This, once again, coincided with our priors; forwards, in isolation, should, theoretically, have greater positive impacts on xG creation.
The most frequent criticism of this model, and similar models, as it pertains to football is that there is simply not enough variation in team personnel during a season to accurately exploit the differences in xG creation, both for and against, between when a player is on the pitch and when a player is off the pitch. At a high level, it is important to understand that, during a given domestic season, an extremely small number of outfield players, typically no more than 5 at the absolute most, play every single minute of the season. Further, to actually prevent the model from making accurate estimates of a player’s impact on xG creation and suppression over a single season, an outfield player would not only have to play every minute of the season, but also would have to play every single minute of the season with another teammate that also plays the identical set of minutes. Specifically, though, this model uses several additional tools to mitigate the “sample size” issue. Primarily, using a generalized ridge regression model with various penalty terms and weights, as opposed to a simple ordinary least squares regression, assuages the majority of multicollinearity problems posed by the data structure. For instance, by weighting time segment length and using a minutes played penalty, the model does not overestimate the impact of players that are on the field for limited minutes, whereas an OLS regression would estimate enormous impacts for players with limited minutes. In addition, by using historical priors, where the model biases estimates towards their previous season’s value, the model introduces a level of variance in player impacts for each season, besides the first season where the model is run, 2014-2015, regardless of the variance in team selections. So, even if two players play the majority of minutes together during a given season, the model can not only exploit the limited minutes when those players were separated, but also exploits the historical estimates of each player’s impact, which introduces differences in the individual player’s impacts.
Given that the Leviathan model estimates each individual player’s impact on expected goals created and conceded per 90 minutes, we can use those outputs to evaluate the impact of each individual player on the probability that their team will win a given match and, therefore, evaluate the impact of each individual player on the number of points their team will accumulate over the course of the season, a metric known as expected points above replacement (xPAR). The following is the basic outline of how the Leviathan model estimates, along with estimates of player shooting ability and shot-stopping ability, which are derived based off of the expected goals model, are transformed into xPAR.
Given data for all of the matches in a season, such as the 2019/20 season, in each of the top 5 European leagues, we can first create a logistic regression that estimates the probability that a given side wins a match, a binary outcome, depending on the team’s xG created, the opposition team’s xG created, and both the shot-stopping and finishing ability of the team’s and opposition team’s players. The outcome variable for the logistic regression is a dummy variable equal to 1 if team A won the match and equal to 0 if team B won the match. Draws are removed from the data set. The first independent variable in the regression is simply the expected goals accumulated during the match by team A minus the expected goals accumulated during the match by team B, otherwise known as the expected goal differential of team A.
The second independent variable is a proxy for the overall shooting ability of team A minus the overall shooting ability of team B. For a player x on team A, I calculate the shooting “value” of that player for a specific season as the share of team A’s xG created over the course of the season when player x is on the pitch that is actually shot by player x, also known as player xG share, multiplied by player x’s shooting impact on the probability that a given shot results in a goal, otherwise known as finishing impact. So, for a player with an xG share of 25% and a finishing impact of 10%, meaning that shots taken by them are 10 percentage points more likely to result in a goal, their shooting “value” will be 0.025, or 2.5%. Then, for a given match, such as the match between team A and team B, in order to calculate the total shooting value of a team for the match in question, we multiply, for each player on the team, the player’s shooting “value” by the player’s minutes played in the match and then take the sum for all player’s on the team. Then, in order to actually create the independent variable for the regression, we subtract the team shooting “value” for team B from the team shooting “value” for team A. This independent variable is known as the shooting ability differential.
The third, and final, independent variable is a proxy for the shot-stopping ability of the goalkeepers on team A and team B. For a goalkeeper on team A, I calculate the shot-stopping “value” of that player simply as the goalkeeper’s impact on the probability that given shot results in a goal, known as shot-stopping impact. So, for a goalkeeper with a shot-stopping impact of -10%, meaning that shots taken against the goalkeeper are 10 percentage points less likely to result in a goal, their shot-stopping value would simply be -0.1, or -10%. Then, for each team during each match, in order to calculate the total shot-stopping value of a team, we multiply, for each goalkeeper that played for the given team, the player’s shot-stopping “value” by the player’s minutes played in the match and then sum the values. Finally, to create the independent variable, we subtract the team shot-stopping “value” for team B from team A. This independent variable is known as the shot-stopping ability differential.
Without delving into the specific interpretations of the logistic regression coefficients, the important takeaway, which coincides with our priors, is that each of the regression coefficients is positive, for each season in the dataset. In other words, having a positive xG differential, having better finishing than the opposition, and having better shot-stopping than the opposition increase a team’s probability of winning. Then, for each player and each season, we have the player’s impact on xG differential, which are the Leviathan model raw outputs, the player’s shooting ability “value,” and the player’s shot-stopping ability “value.” Given that each of these three estimates correspond to the inputs of the logistic regression, we simply use the results of the fitted model to predict the impact of each individual player on the probability that their team wins a match. This metric is then multiplied by the player’s number of minutes played divided by 90 to arrive at an estimate of expected wins above average (xWAA).
The next step is to convert from expected wins above average to expected wins above replacement. In order to do so, we must define the value of a “replacement” level player. In this case, I consider a “replacement” level player to be equivalent to a relegation level player. Because players on relegated teams are quite literally replaced by players from another team the following season, they can, on average, be replaced by players of a similar overall ability relatively easily. To quantify the xWAA value of a “replacement” level player, I sum the xWAA values for all of the players on relegation level teams across the top 5 European leagues and divide by the number of players. This gives us the average expected wins above average value for a player on a relegation level team. Then, I subtract that average relegation level xWAA value from the xWAA value of each individual player in the dataset. This new value is known as a player’s expected wins above replacement value, also called xWAR. Now, in order to transition from expected wins above replacement to expected points above replacement, or xPAR, we simply have to multiply xWAR by 3, given that 1 win is worth 3 points. The value of using xPAR over the raw Leviathan model isolated impact estimates is that the xPAR metric combines both a player’s play driving ability and their shooting or shot-stopping ability into a single value.
The Leviathan model produces estimates for an individual player’s impact on chance creation and opposition chance suppression. The Leviathan model attempts to estimate how good or bad an individual player is. The Leviathan model estimates, while extremely informative on their own, can also be used to estimate the number of points that an individual player is worth over the course of a season.
None of this work could have been possible without the work of Micah Blake McCurdy. Ultimately, much of this work is a feeble attempt to recreate Micah’s Magnus model, which isolates the impact of ice hockey players, for another sport, in this case football. I am forever indebted to his work at HockeyViz. In addition, I have borrowed plenty of viz ideas and inspiration from the Evolving Wild twins.