Every player on the pitch, goalkeepers included, work in concert with each other to simultaneously generate shots for their team and prevent the opposition team from generating shots. But, not all shots are created equal; a shot from the corner flag does not have the same probability of resulting in a goal as a shot from the penalty spot. This model, much like other “expected goal,” or “xG,” models, attempts to quantify the likelihood that a given shot results in a goal.

During a match, all of the players on a team work together to generate shots for their team and prevent shots from the opposition team. I assume, as is the basis of the Leviathan model, that all of the outfield players on the field equally affect both their team’s ability to generate shots and prevent the opposition team from generating shots. In addition, I assume that goalkeepers affect their team’s and the opposition team’s shot creation, though I assume that their impacts have a much smaller magnitude than those of the outfield players. Every action on the field leading to a given player’s decision to take a shot is, therefore, the product of the ability of all of the other players on the pitch. The probability that a given shot, which has been generated by the actions of all of the players on the pitch, results in a goal at the exact moment that the shooter decides to take the shot is the “expected goal probability,” or simply “xG.”

In addition to the xG of the shot itself, which is determined by a number of shot factors, such as shot type, shot location, and last action prior to the shot, the individual shooter can affect the result of the shot by either shooting accurately or failing to hit the target entirely, an ability that I quantify as “shooting impact on goal probability.” In addition, the goalkeeper can affect the probability that a given shot results in a goal after the shot is taken by having quick reflexes or by being woefully mispositioned, an ability characterized as “goalkeeping impact on goal probability.”

All players work to generate a shot, the shooter shoots the ball, and the goalie attempts to save the shot.

Using all of the shots taken in a given season, such as the 2014-15 EPL season, as inputs, I model the probability that a given shot will result in a goal linearly and fit the model with a generalized ridge logistic regression. I use a design matrix \(X\) for all shots taken with a player’s foot and a design matrix \(Z\) for all shots taken with a player’s head where each row is an individual shot with the following columns (I use two separate design matrices \(X\) and \(Z\) based off of the assumption that certain players may be above average finishers with their feet, but may be below average finishers with their head, and vice versa):

- An indicator variable for the shot “situation,” such as direct free kick, open play, penalty, or corner.
- An indicator variable for the shooter.
- An indicator variable for the goalkeeper.
- An indicator variable for the “last action” that preceded the shot, such as blocked pass, rebound, cross, or through ball.
- An indicator variable for home or away.
- A set of geometric terms:
- Consider the attacking half of the pitch as a hexagonal grid of cells. For each of the two overarching shot types, either feet or head, I subset the hexagonal grid of cells to the cells that contain at least one shot of the overarching shot type for the given season. Every shot, therefore, is assigned to the closest hex cell. Each hex for each of the two shot types corresponds to an indicator variable in the model. The purpose of these geometric hexes is to incorporate the shot location into the probability that a shot results in a goal within the model.
- In order to minimize overfitting, given that there are hundreds of unique hexes and several of them may only contain a few shots in a given season, I use fusion penalties within the logistic ridge regression to penalize differences between covariates of adjacent hexes. These fusion penalties ensure that the impact of shot location on xG does not vary wildly between nearby points.

While the exact coefficients from a logistic regression can be difficult to interpret, the signs of the coefficients are informative and easier to understand. In the field hex chart above, for example, we can see that as the shooting location gets increasingly close to goal, the likelihood increases that a shot results in a goal. In addition, the sample of xG model covariates from the foot shooting model below provides information on how situations and actions preceding a shot can impact the expected goal probability of the given shot. For instance, a foot shot directly resulting from a corner-kick is less likely to result in a goal than a foot shot from a set piece or a foot shot from open play. In addition, a foot shot resulting from a cross is much less likely to result in a goal than a foot shot resulting from a rebound or through ball.

Covariate | Impact on Goal Probability |
---|---|

Situation: Corner-kick | -58.7% |

Situation: Open Play | 0.0% |

Situation: Set Piece | -15.8% |

Last Action: Cross | -46.9% |

Last Action: Interception | -11.1% |

Last Action: Rebound | 31.9% |

Last Action: Through Ball | 39.9% |

While the logistic ridge regressions of the design matrices \(X\) and \(Z\) produce estimates for both player shooting ability and goalkeeper shot stopping ability, the estimates are confined either to their foot finishing or foot shot stopping ability or head finishing or head shot stopping ability. In other words, the xG regression model produces two different estimates for player ability; one for feet shots and one for head shots. Given that the xG model produces two finishing or shot stopping estimates for each player, rather than take an arbitrary average between each individual player’s two covariates, I run an additional regression model. This additional regression model uses all of the shots in a given season, both headers and feet shots, as inputs with the shot result, either goal or no-goal, as the dependent variable. The covariates are as follows:

- An indicator variable for the shooter.
- An indicator variable for the goalkeeper.
- A variable for the “expected goal probability” of the shot characteristics as estimated by the previously explained expected goal model, but without any probability adjustment for the specific goalkeeper or shooter.

This model then yields estimates for the overarching “shooting impact on goal probability” and “goalkeeping ability on goal probability” for each player and goalkeeper.

In summation, this is an xG, or expected goal, model that quantifies the probability that a goal results from a shot, given the shot’s various characteristics. This model is used as the basis for determining both an outfield player’s “shooting impact on goal probability” and a goalkeeper’s “goalkeeping impact on goal probability.” Finally, the expected goal probability of a shot, given the shot’s characteristics, but excluding the goalkeeper and shooter, can be used to quantify the quality of shots generated by a team, regardless of the goalkeeping and shooting ability of the team’s players.