Archive

Uncategorized

Introduction and data

The aim of the following analysis is to quantitatively evaluate finishing skill in football and to create an idea on who the best finishers are. By finishing skill here we will mean the ability of a player to transform shots into goals. As simple as that. It should be noted that finishing, as defined above, is different from goalscoring, which is potentially a broader and more complicated concept. The best finisher doesn’t necessarily mean the best goalscorer or the best forward. Let’s see what comes out!

Finding free accessible football data is always a problem (it’s the main problem, actually …) but here and there something good appears. The data used in this analysis can be found here. The database includes club football data starting from the year 2001, covering club competitions in Europe (but not only) and gives information on various events/aspects related to a match. After initially examining the data and doing all the inevitable data wrangling/cleaning processes, I decided to keep the information about shots taken between the years 2006 and 2017 (12 years) from players in the European 5 main leagues (Spanish La Liga; Italian Serie A; English Premier League; German Bundesliga; and French Ligue 1). Data before the year 2006 didn’t look to have the same accuracy and completeness and in order not to endanger the accuracy of the analysis, I excluded them.

In total we have 685,953 non-penalty shots taken by 14,267 distinct players. For each shot we know the player who takes it, the output (goal/no goal) and the location from where the shot is taken (therefore we are able to calculate the distance from the center of the goal).

1

Fig-1

Fig-1 gives an initial view on the data we are considering, by showing the total number of shots and goals for all players. From an initial look, we could divide the players in Fig-1 into three “groups”:

    1. The first group is composed by Cristiano Ronaldo and Messi, who have taken much more shots and have, of course, scored more than any other player. Cristiano in particular has taken an incredible amount of shots and it’s interesting to note that he has scored less goals than Messi, despite taking 1000+ more shots.
    2. The second group includes some of the best forwards of the last 10-15 years in Europe, who have scored between 200 and 300 goals.
    3. The third group includes the rest of all the players, whose numbers are more smoothly distributed.

Our data of 685,953 shots is relatively very large and the time range of 12 years is considerable (12 years is the time frame between 4 World Cups!!!) but, you know, data is never enough. One of the problems we face is that, although this time range includes most of the career of many players (e.g it includes almost all of Messi’s career) it leaves out many years of football from a lot of players. For example, the data includes only 284 shots (56 goals) from Filippo Inzaghi and 484 shots (57 goals) from Alessandro Del Piero. The best years of their career are excluded. Inevitably, the estimation of their finishing skill is expected to be affected by this and it should be treated with additional caution.

Methodology

This analysis is mainly based on the simple and primitive parameter of “conversion rate”, which shows the ratio between goals and shots taken for each player. It obviously takes values in the 0.0 – 1.0 range (unless you can score two goals from one shot…Hi Leo!) and, in principle, higher the conversion rate for a player a better finisher he is. The conversion rate for all players is shown in Fig-2. In overall, 70,085 from the 685,953 shots were scored, giving an average conversion rate of 0.102.

2

Fig-2

The conversion rate is an indicator of a player’s finishing ability but it can’t be directly used for our purpose. There are three main flaws related to the conversion rate. Discussion on these flaws and also proposed solutions/corrections are presented in the following paragraphs.

1 – Firstly, not all shots are the same, i.e. they have different probabilities to be transformed into goals. Some shots are easy to be scored and some are more difficult. Also, different players tend to take different type of shots and because of this, directly comparing their conversion rates wouldn’t be a fair way to judge on their finishing ability.

One reasonable way to deal with this is to build/apply an expected goals model to our data and then to categorize shots based on the expected goal value, allowing us to take into account the difficulty of each shot. Unfortunately, we don’t know enough parameters to do this. Instead, to categorize shots, we will use the only parameter we have: the distance from the center of the goal. This is obviously not ideal, since the shot difficulty depends on many other variables, but considering that the distance from the goal is the most influential parameter in almost every expected goals model, it makes sense to use it. This will hopefully allow us to take into account shot difficulty.

I have divided all shots into 25 categories based on their distance from the center of the goal. Each category has a range of 15m and they overlap each other by 14m (more than 93%). Namely, these categories are:

    • Category 1: 0m to 15m from the center of the goal
    • Category 2: 1m to 16m from the center of the goal
    • Category 3: 2m to 17m from the center of the goal
    • Category 23: 22m to 37m from the center of the goal
    • Category 24: 23m to 38m from the center of the goal
  • Category 25: 24m or further from the center of the goal (the last category).

Although this categorization divides shots into discrete groups, the fact that they overlap by more than 90 percent of their length and they offset by just 1m, allows us to look at this division as a quasi-continuous one.

To see if this procedure improves our analysis, let’s have a look at the following graph.

3

Fig-3

Fig-3 shows the distribution of shots according to the distance from the goal for few players. On the left we have all shots while on the right we have distributed shots in the 0m to 15m range only. As we can observe from the histograms on the left, each player has his own distribution and shooting pattern (some take more short shots and some take more long shots). When we restrict the shot range, although we can still notice differences among players, they are much smaller compared to when we plot the distribution of all shots. It’s not a perfect approach but it’s not a null procedure either. Replacing distance with expected goals though, would considerably improve our analysis.

Using distance as a criteria to categorize shots is expected to make relatively more sense as we move away from the goal. Shots located near the goal can originate from various sources (e.g. cross, through ball, etc), they can be shot either with the foot or the head and you can also have special chances like one vs one with the goalkeeper, etc. All these variables influence the probability of the shot to be transformed into a goal.  While we move away from the goal, the influence of all these parameters tend to decrease and the shot distance becomes even more the main predictor. I don’t have any number supporting this claim, so consider it as just a speculation.

2 – The second problem that prevents us from directly using the conversion rate is related to the high variation in the number of shots taken by each player. Let’s have a look at the histogram of number of shots by each player (Fig-4). The vast majority of the players have taken few shots. Actually, the median number is around 6 shots/player, which is very low. Even though in the further sections of this analysis we have not considered players with less than 10 shots, still some correction in this regard is needed. This is because it’s tricky to compare different proportions (conversion rate is a proportion!). Some player has scored 5 goals from 100 shots and some other has scored 50 from 1000 shots. Who is better?

4

Fig-4

To account for variation in number of shots taken we use empirical Bayes estimation, which is a method that considerably improves the estimation of the conversion rate of all players. The idea is very simple: initially we model the conversion rates of our data set as a beta distribution, which is a very appropriate distribution for parameters restricted between 0 and 1. We consider it as a prior distribution and then we combine it with the individual data of each player (number of total shots and goals) to get an updated estimate of the conversion rate. As we’ll see later, this allows us to build distinct distributions who represent the conversion rate of each player.

3 – Thirdly, as we previously noted, the overall average conversion rate is around 0.10 but that doesn’t remain constant for all the range of shots taken per player. Looking again at Fig-2, we notice that players with a lot of shots tend to have a higher conversion rate. We need to include this information in our analysis and for that we will use beta-binomial regression, a technique that basically incorporates the number of total shots in building the prior distribution.

The following section shows the application of the above described methodology on the 1st of the 25 categories in which we divided shots.

Partial application

Let’s go step-by-step and find out who are the best finishers for shots within the 1st category (shots in the 0m to 15m range).

Fig-5 shows shots and goals for each player while Fig-6 shows shots and the conversion rate, confirming the ascending trend of the conversion rate with the increase of shots taken.

5

Fig-5

6

Fig-6

In order to improve the estimation of the conversion rate for each player, we apply the empirical Bayes estimation and the beta-binomial regression methods. Fig-7 shows the histogram of conversion rate (after we filter for players with 200+ shots, with the aim to have a more accurate representation of the conversion rate) and the fitting beta distribution, which we consider as a prior. It doesn’t look like we have a very good fit here but it considerably improves while we move through the next shot categories.

This prior distribution and the data (shots and goals) of each player will allow us to build a distinct distribution of the conversion rate for each player. The properties of these distributions is what we will use to evaluate player’s finishing ability.

7

Fig-7

The two graphs in Fig-8 show how the conversion rate of each player is transformed. What happened here is that all conversion rates moved towards the trend line, considerably narrowing the initial range of values. The role of the beta-binomial regression procedure here is that it makes it possible to shrink the conversion rates towards the trend line. Without it, the conversion rates would have shrunk towards the horizontal line of the overall average conversion rate.

8

Fig-8

Fig-9 is a combination of the two graphs of Fig-8. The vertical lines here show the degree of transformation of the initial conversion rates. As we can see, not all the players are influenced the same: players with a lot of shots (and also players close to the trend line) are slightly affected, since the initial estimation of the conversion rate for them is much more representative.

9

Fig-9

We can now plot posterior distributions of the conversion rate for all the players in our database. We can use these distributions not only to judge about a player’s finishing ability but also to compare and rank them based on this skill. For example, Fig-10 shows the posterior distributions of the conversion rate for three well known players: Džeko, Icardi and Salah. The dashed curve represents the prior, which is basically the distribution of the average conversion rate of all players, i.e. that’s how an average finisher is expected to perform. If a player’s curve is on the right of the prior it means that he probably is a better finisher than the average and the more on the right the better. On the other hand, players who have their curve positioned on the left of the prior (e.g. Džeko) are probably worse finishers than the average. The height and width of the curve is related to the number of shots taken by a player. Džeko’s curve being higher and narrower means that he took more shots than Icardi and Salah.

10

Fig-10

One positive aspect of representing conversion rates as a distribution (instead of as a single value) is that it makes comparisons between players more meaningful. Looking at Salah’s and Icardi’s curves, we might infer that Salah is probably a better finisher than Icardi. Only probably though…everything is expressed in probabilistic terms. The two curves overlapping means that there is also a small probability that Icardi is a better finisher. We can calculate these probabilities by using the properties of each distribution. The probability that Icardi is a better finisher than Salah is just 2.8%, while the probability that Salah is a better finisher than Icardi is 97.2%.

Conversion rate distributions are very effective in visually comparing two or three players to each other but if we want to compare more players it gets difficult because curves will eventually overlap a lot. We could avoid this by building credible intervals. Credible intervals show the range of values within which a player’s conversion rate lies with a certain predefined probability (this predefined probability can be set to 50%, 90%, 95%, 99%, etc). Fig-11 shows the top 10 finishers for shots in the 0m to 15m range, specifying for each player the median conversion rate and the 90% credible interval. There are some surprising names on that list, but things start to become more “normal” as we move away from the goal.

11

Fig-11

Full application and results

The previous section presented the application of our methodology on the 1st out of the 25 categories in which we divided shots. Now we will repeat the same procedure on all shots and, to avoid repetition, we won’t go into details again.

Let’s see how the number of shots and goals changes from one shot range to another. If we plot shots and goals for all players, separated for each of the categories, we get Fig-12. Looking at the graph, it’s quite nice how the trend line slope gradually decreases while we move away from the goal, representing the decrease in the conversion rate. We could also notice Messi and Cristiano Ronaldo somehow close to each other and separated from the rest of the players (those two dots in the upper part of each scatter), especially for categories that show shots close to the goal. As e move away from the goal though, we notice that Messi’s shots are reduced until he eventually joins the rest of the players. Cristiano’s shots, from the other hand, remain almost constant with the increase in shooting distance. It’s just incredible how many long-range shots he has taken…more than 1,300 shots at a distance 24m or greater. Let’s see how this influences the estimation of his finishing ability.

12

Fig-12

So, for all the above shot ranges we applied the same procedure as in the detailed example we previously showed. As a result, for each player and for each shot range we get a distribution that characterizes their conversion rate (i.e. their finishing ability). Now let’s sum up the results!

13

Fig-13

Fig-13 shows the median conversion rates for all players and for all shot ranges. I have pointed out Messi, Higuaín and Ribéry, as the only players who hold 1st places in at least one of the categories. With the exception of few categories, Messi’s median conversion rate line is completely isolated from all other players. To illustrate it better let’s see at Fig-14 (which basically shows the same data as Fig-13 but differently visualized). We have separated shot ranges and plotted the median conversion rates of all players as a histogram and then added the overall median and Messi’s median.

17

Fig-14

In order to see how some of the best finishers and some of the main attackers of the last 12 years rank according to their finishing ability, I built 3 graphs in which for each of the shot ranges players are ranked according to their median conversion rate (Fig-15, Fig-16 and Fig-17). Please note that, in these graph,  the vertical axis is in logarithmic scale, since even one player’s rank can change by few orders of magnitude (the number of players for each range is up to almost 5,000).

14

Fig-15

15

Fig-16

16

Fig-17

It’s just incredible how much Messi dominates this stat. Few other players manage to be good at some of the shot ranges but their rank inevitably and dramatically drops in other ranges. Higuaín and Lacazette are probably the players with the closest finishing ability to Messi, although their rank drops a lot in long-range shots (shots longer than 20m). Salah and Griezmann show more or less a similar pattern but with worse ranking.

Other players show different trends. Del Piero and Dybala, for example, have a remarkably similar pattern, ranking very behind in short-range shots and then gradually improving while moving away from the goal. Cristiano Ronaldo’s ranking looks somehow similar, although his rank in short and mid-range shots isn’t as bad as Del Piero’s or Dybala’s, as he ranks between the 40th and 190th place (Del Piero and Dybala can rank as low as the 400th and 500th place). The three of them eventually join the top 10 finishers in long-range shots. Also, Ibrahimović’s rankings are not so good for short-range shots but for mid/long-range shots he improves spectacularly, becoming the 2nd best finisher after Messi.

Just for illustration, in Fig-18 I built a Messi vs Cristiano comparison of the posterior distributions of the conversion rate, for all shot ranges. The difference is so huge that their curves barely intercept. They become comparable in the long-range shots, where Messi’s curve is still notably on the right.

18

Fig-18

As we previously explained, we can actually calculate the probability that Messi is a better finisher than Cristiano Ronaldo (and vice versa) for each shot range. These values are shown in Fig-19. This may look harsh on Cristiano but that’s how the numbers sum up. Both players have very narrow curves, due to the large number of shots they have taken, and that limits Cristiano’s chances.

19

Fig-19

Final ranking

So, since all the above comparisons are made for each shot range, it would be nice to have a single ranking for all shots. For this I plotted (Fig-20) each player’s rank for each shot range (grey dots) and then ranked them according to the median rank (red dots). It’s not such a rigorous method but it will do…we are doing this just for fun.

20b

Fig-20

Fig-20 shows the top 20 finishers since 2006. Actually, before posting the article I asked my followers in Twitter who they thought was the best finisher and I got very good suggestions (if I compare them to our final list). Apparently finishing skill is perceivable and many people have a very good intuition on this regard.

Many people from Twitter suggested Messi as the best finisher but few of them noted that “Messi seems too obvious”. When I started this analysis my idea was that he would end up as a very good finisher but I was almost sure that 5-10 players would perform better than him. According to the analysis we conducted, he is the best finisher, by far. Now that I think about it, what kind of finisher scores 91 goals in one year?!

Some other frequent suggestions included Icardi, Agüero, Higuaín, Cavani. These are what we might call “the obvious good finishers”. Higuaín represents a very unfortunate case. One of the most clinical finishers of the 21st century and failing to deliver in those finals with Argentina…

P.S.

If a Top 20 list doesn’t make you happy, I have a Top 200 finishers list (Fig-21). It doesn’t really make sense to rank so many players but anyway, here it is. You should not read read this list literally…it’s just useful to get a general idea.

20c

Fig-21

Advertisements

It’s not a secret that, Messi aside, the current Argentina national team is not exceptional. There has always been some debate on this and the general consensus is that they have a lot of talent in attack while the rest of the team is “normal”. Since “normal” is not always easy to define, I tried to understand how strong this squad is compared to European football clubs. This might be helpful, considering that we watch club football much more often than international football and it’s easier to create a general idea how good each club is.

I have used a very similar methodology to the one used in this article at Financial Times, written by Murad Ahmed and John Burn-Murdoch. The idea is very simple and can be explained as follows:

  • Argentina’s squad strength is valuated according to the quality of their players, better the players, better the squad (and vice-versa).
  • Each player’s quality is estimated by two parameters:
    • quality of the club they play: good players play on better clubs,
    • level of participation: good players play more minutes.

I have used the data for the 2017/18 season. The quality of the club is represented by the Soccer Power Index (SPI), which is a metric to rank football clubs developed by FiveThirtyEight, while minutes played by each player are from Transfermarkt.

The graph in Fig 1 shows minutes played in 2017/18 and their respective club’s SPI for all Argentina’s NT players. In simple words, if a player plays in a club with a high SPI and if he has played a lot of minutes in 2017/18 (top-right players), then he increases the quality of the squad. As it is quite obvious, most of this squad strength comes from the 4 forwards, who are all key players in their respective clubs.

To get an overall estimation for Argentina, I calculated a weighted average of their squad SPI, excluding Messi, and another average SPI excluding the 4 forwards (just to get an idea of how good the rest of the team is).

arg

Fig 1

The bottom graph in Fig 1 compares Argentina’s squad average SPI (74.1) to European football clubs who lie on the 70-75 range. This gives an idea on how good this squad is, i.e. something in between Eibar and Marseille.

Note: FiveThirtyEight’s SPI ranking doesn’t include Chinese Super League and, to evaluate Mascherano’s new team (Hebei China Fortune) strength, I used another team rating metric and converted it to FivethirtyEight’s SPI system.

Introduction

This article is an attempt to give quantitative and visual insights on the best playmakers in European club football, with focus on their vision, i.e. their ability to find/create passing lines through and behind the opposition defense and create goal scoring chances. The analysis is focused on the 2017/18 season and takes into account domestic games in European top 5 leagues as well as UEFA Champions League and UEFA Europa League games. The data used here are provided by Stratabet and since it’s the end of the season, I’d like to thank them for providing free access to detailed football data, which is something uncommon in the football world.

Forward keypasses

Obviously, not all goal-scoring chances are the same and have the same difficulty in being created. On the contrary, the observed difficulty range is particularly wide, and it includes chances created by a simple lateral pass in front of the goalkeeper and chances created by a through ball that penetrates the opposition defensive line. Of course, it’s very difficult to take into account all the parameters and variables that influence the degree of difficulty in creating a chance. In any case, insight about a player’s vision can be drawn even from simple data. Here we’ll focus on those keypasses (keypass is the pass that precedes the shot) that have a verticality of 10m or greater, i.e. those keypasses where the ball has advanced for 10m or more between the position from where the pass is made to the position from where the shot is taken. Let’s call these “forward keypasses”.

Fig-1

Fig 1

This is far from a perfect metric but I believe it gives useful information on a player’s vision and playmaking skills. Fig 1 shows the separation of forward keypasses from other keypasses, taking Manchester City’s midfielder David Silva as an example.

The above definition doesn’t necessarily guarantee that the keypass has penetrated a defensive line or that creating such a chance has a certain degree of difficulty but that’s the best I can do for now. Data about the position of the opposition players would considerably increase the quality of this metric. Also, in many occasions, third passes (the pass previous to the keypass, also called “penultimate passes”) are much more important than keypasses but I don’t have location data for those, hence, they are not included here.

In overall, we have 32,449 open play chances in the 2017/18 season, which, according to their verticality, are distributed as shown in Fig 2. As previously mentioned, our focus are those chances/keypasses on the right of the 10m verticality line (12,343 forward keypasses).

Fig-2

Fig 2

These forward keypasses are distributed among 2,342 players, which means that basically every player sooner or later managed to create one. Typically, the average number of forward keypasses created by a player is just above 5 and the number of players who have repeatedly created such chances is very limited (Fig 2). Clearly, there are no hundreds of playmakers with unlimited vision out there. The list of players with most forward chances (see also the chart in Fig 3) is topped by Lionel Messi, with 55, Christian Eriksen is 2nd with 49, then the top 5 is completed with De Bruyne (48), Cesc Fàbregas (47) and Dimitri Payet (45).

Fig-3

Fig 3

Verticality

The chart in Fig 4 shows a list of the top 20 players with most forward keypasses, ranked by the average verticality (vertical ball advancement). The list shows also the verticality of each of the forward keypasses these players have made and, as we can observe, there is a lot of variety among these players.

Fig-4

Fig 4

Despite reflecting a player’s abilities/limitations, this is also a consequence of a player’s position on the pitch and team tactics. To illustrate, let’s have a look at forward chances created by Lionel Messi and the young french midfielder Tanguy Ndombele (Fig 5). Playing deeper and with less goal-scoring duties, allows Ndombele to create deeper and more vertical chances. This difference can also be observed at the position of those rug lines on the left of each football pitch.

Fig-5

Fig 5

Horizontality

Beside the variation in verticality, a considerable variation in lateral ball movement (horizontality) is also observed among the players in our data set. The chart in Fig 6 shows the average vertical and horizontal ball movement for all the players, considering their forward keypasses only. The variation on both parameters here is more visible. Generally speaking, having a considerable horizontal ball movement is not a good indicator of a player’s vision, since some of the passes with high horizontality are basically crosses.

Fig-6

Fig 6

Fig 7 shows the forward keypasses made by two very distinct players: Luis Alberto, whose keypasses are dominated by verticality, and Douglas Costa, whose keypasses are dominated by horizontality. I suppose, part of those chances created by Douglas Costa are classified as crosses and, without trying to downplay their importance, crosses is not what we are looking for here.

Fig-7

Fig 7

Summary chart

The chart in Fig 8 is a visual interpretation of the forward keypasses for each player (sequentially added), expressed in terms of verticality and horizontality. Each line here represents one player and it’s composed by joining his forward keypasses one after the other. Players with most forward keypasses appear on top and players with relatively high verticality appear on the left.

Fig-8

Fig 8

Present here are the usual suspects, top midfielders from top European clubs (Cesc, De Bruyne, Eriksen), some new entry (Ndombele, Milinkovic-Savic) and Messi. Of course, after scoring 600+ career goals and winning his 5th European Golden Shoe, it’s completely normal to also be the best playmaker in the world.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

How many times did you search for ‘the best penalty takers of all time’ in Google and then clicked one disappointing article after another. Yeah, thought so. Well, this is an article about the best penalty takers of all time, so let’s hope it is less disappointing.

The following analysis is based on the methodologies used in this series by David Robinson (@drob on Twitter) and in this article by @OMalytics.

Data

Finding reliable data about penalties is very difficult. The best that you can have are very limited lists (usually top 10 lists) of top penalty scorers for national leagues or important competitions like the World Cup. What makes it very difficult to have penalty data is that, even though penalty goals are well documented and somehow easier to be tracked, data about missed penalties are very scarce. I initially thought to use data from Transfermarkt but I was pointed out on Twitter (and then verified myself) that their data are wrong.

In any case, one day I abandoned English and thought to goggle ‘i migliori rigoristi di tutti i tempi’, which is ‘the best penalty takers of all time’, in Italian (it is nice that Italians have a specific word for the player who takes penalties: rigorista). The search was successful, as I found an impressive penalty database, probably at the least expected website. There is this general blog Sdoppiamo Cupido (!!!), where two people, Angelo Vigorita and Federico Morano, worked for years (and are still working) to compile a list of the players with most taken penalties in their career, specifying both scored and missed penalties. Impressive, really! The list can be found here. Other topics that this blog covers are: cold fusion, modeling, music and song lyrics (!!!).

If you navigate through this list and through the comments sections of related articles, you could have an idea of the work invested on it. I took this list, completed it a bit with data about recent players (data about them are way easier to be found) and that is the dataset I have used in the following. It includes the vast majority of the most frequent penalty takers in the world but it most likely is not exhaustive, since it is concentrated on relatively known players playing in Europe and South America. Maybe some excellent penalty taker played in some not well known league and we just do not know. So, consider this as an article on the best penalty taker in the world from those who are somehow considered as well known players.

The authors of the database also emphasize that for some players it was impossible to find data about missed penalties. This includes players like Romario, Zico, Enzo Francescoli, Socrates, Puskas, etc. These players are excluded from the following analysis, unfortunately.

Data exploration

Let us initially have a look at the data we have. In total we have 12,649 penalties, taken by 484 players. Total and scored penalties for each player are shown in Fig-1.

Fig-1

Fig-1

The basis of the analysis is the conversion rate, which is a very basic parameter that shows the ratio between scored and total penalties for each player. It obviously takes values in the 0.0 – 1.0 range and normally, the higher the conversion rate for a player a better penalty taker he is. Out of 12,649 penalties in our database, 10,402 were scored, giving a conversion rate of 82.2%. The number of total penalties and the respective conversion rate for each player is shown in Fig-2.

Fig-2

Fig-2

Although conversion rate is a valid indicator, we cannot directly use it to ultimately compare and evaluate penalty takers. This because for the majority of the players we have scarce data, since they took a relatively small number of penalties (Fig-3) and their average conversion rate is not representative.

Fig-3

Fig-3

The obvious problem here is that it is very tricky to compare different proportions. E.g. who is better, a player who scores 9 penalties out of 10 or one who scores 36 out of 40? Also, how do we compare a player who has scored 10 out of 10 total penalties to one that has scored 98 out of 100? We can use the conversion rate for each player as a comparison criterion but we know that something is not right. We need to transform it a bit.

Transformations

In order to have a more representative metric for a player’s ability to score penalties, we need to transform (improve) the penalty conversion rate. This transformation includes two aspects:

  1. Firstly, we take into account that some players have taken considerably less penalties than others (Fig-3). We do this by using empirical Bayes estimation, as a method to improve the average penalty conversion rate for each player. Initially we model the conversion rates of our dataset as a beta distribution, which we consider as a prior distribution (Fig-4) and then we combine this prior distribution with the individual data of each player (number of total and scored penalties) to get an updated estimate of the conversion rate.
  2. Secondly, we take into account the fact that better penalty takers take more penalties. This can also be observed from Fig-2, where conversion rate tends to be higher in players with more total penalties. This is a problem because it makes us overestimate players with few penalties and to underestimate players with a lot of total penalties. To address this issue, we will use beta-binomial regression, a technique that basically incorporates the number of total penalties in building the prior distribution.
Fig-4

Fig-4

Fig-5

Fig-5

The two above-described transformations are illustrated in Fig-5. From left to right we have:

  • 1st graph: initial estimation of conversion rates,
  • 2nd graph: conversion rates after combining prior distribution to each player’s data,
  • 3rd graph: conversion rates after taking into account the number of total penalties by each player.

From Fig-5 we can see that what we basically did was moving all conversion rate estimates towards the average trend line by narrowing the initial range of conversion rates into a new one. As you may notice, not all players are uniformly influenced by this procedure. Players with a relatively large number of total penalties tend to be less affected, following the logic that for them the initial conversion rate is much more representative.

We need to emphasize that all this analysis is based on the assumption that all the penalties are the same, i.e. they are taken under the same conditions and the probability to convert penalties into goals is the same. This is not true, of course, but it is an acceptable assumption to be made. Some of the factors that influence the difficulty scale of a penalty are: the quality of the goalkeeper; game state and situation; psychological factors; weather conditions; etc. We are neglecting all these factors.

Results

The approach that we followed in the previous section allows us to build a probability distribution of the conversion rate for each player. These are called posterior distributions and are created by combining the prior distribution with the individual data (what we previously discussed). Let us take few examples.

Fig-6 shows the conversion rate distribution for Roberto Baggio and Lionel Messi compared to the prior distribution of all players. What Fig-6 tells us is that Baggio is probably a better penalty taker than Messi (Baggio’s curve is on the right). It also tells us that Messi is a worse penalty taker than the average of the players included in the dataset (his curve is on the left of the dashed curve).  Baggio’s curve being higher and narrower simply shows that he took more penalties (133) than Messi (107). Actually, Baggio is the player with most total penalties in the dataset, followed by Cristiano Ronaldo (128) and Totti (113).

Fig-6

Fig-6

Fig-7 shows another example of comparing conversion rate distribution curves featuring Mat Le Tissier, Diego Armando Maradona and Marek Hamšík.

Fig-7

Fig-7

These distribution curves enable us to compare penalty takers to each other but, sometimes, such comparisons are visually difficult to be made, especially if we have more than 3 players (curves). One way to avoid this is by building credible intervals for each player. They show the range of values within which a player’s conversion rate lies with a certain predefined probability (this predefined probability can be set to 90%, 95%, 99%, etc). Fig-8 shows the median conversion rate and the 95% credible intervals for the top ten and bottom ten penalty takers (out of 484 players in our database).

Fig-8

Fig-8

According to this analysis and to the dataset we have used, Cuauhtémoc Blanco (71 scored out of 73 total penalties) is our best penalty taker. If you don’t know who he is, click here for some magic. Blanco is followed by Graham Alexander (77/83) and Matt Le Tissier (49/50). The three worst penalty takers are Marek Hamsik (7/15), Marino Perani (10/19) and Edin Džeko (7/14).

A more complete list of the top 100 penalty takers is shown in Fig-9. Maybe you can find your favorited player there.

Fig-9

Fig-9

Another way of using each player’s conversion rate probability curves is by calculating the probability that one player is a better penalty taker than another. For example, if we refer to Fig-6, we can calculate that there is 87.1% probability that Baggio is a better penalty taker than Messi. Also, according to our results, we can say that Blanco is probably the best penalty taker in the world, but we cannot say that with absolute certainty. What we can say is that, from all the players we have considered and according to our methodology, Blanco has the highest probability of being better than the rest (around 66% probability that he is a better penalty taker than Alexander and Le Tissier (and so on).

As a conclusion, if my World Cup would depend on one final penalty, I would let Cuauhtémoc Blanco take it.

Over the past three seasons, during Lucho’s era, it was no surprise that one of the main characteristics of the Barça team was the reliance on the immense attacking power of the trio Messi, Suárez and Neymar. This had its advantages and disadvantages (of course). It led to seven trophies in the first two seasons and it probably was one of the main reasons why the team failed to deliver in the third season.

With the departure of Neymar, things were about to change. Valverde, Dembélé, Paulinho, Semedo, Deulofeu (and Coutinho in January) arrived and there was curiosity on how things in Barça’s attack will change:  how the void created by Neymar’s departure would be filled and maybe exploring opportunities to more evenly distribute the “responsibilities” in the attack. Unfortunately, due to various reasons, the attacking opportunities of the team during this season have been very limited. The main reasons probably are the two consecutive injuries of Dembélé, the injury of Alcácer (in one of his best moments), the bad form Suárez went through the first months of the season and the inability of few other players (Deulofeu, André Gomes, Denis Suárez) to provide important and consistent contribution in attack. The consequence is that this season Barça’s attack is incredibly dependent on Messi, both in terms of scoring goals and creating goal-scoring opportunities. I build few graphs to put this into perspective.

Fig. 1 shows a chance creation matrix for Barcelona for this season, with the aim to illustrate the combinations of players leading to a shot: x axis shows the player who takes the shot; y axis show the player who provides the pass before the shot (i.e. the player who creates the chance).  We move on these axes in order to see how often various players combine with each other. The number of combinations is shown by the size (and color) of the squares.

matrix

Fig. 1

Messi is the only Barça player who consistently creates chances for almost all of his team mates. You can perceive that by looking at the horizontal line along Messi’s name in the y axis. Most of the chances Messi creates obviously go for Suárez (red square) and Paulinho. No other Barça player has a similar distribution. Also, few interesting patterns can be observed from Fig. 1:

  • there is a strong connection between Sergi Roberto and Suárez (the same for Dembélé and Suárez);
  • Busquets likes to create chances mostly for Messi (his signature breaking-the-lines passes);
  • in contrast to Alba, Digne has never created a chance for Messi (or Suárez);
  • the vast majority of shots (vertical lines) are concentrated of course on Messi, Suárez and Paulinho. The contribution of other players is very small.
  • the last horizontal line shows chances not directly related to a teammate pass (after rebounds, interceptions, etc).

The graph in Fig.2 shows how shots are distributed within each La Liga team. Here the x axis indicates the portion of a team’s shots taken by each player. There are various situations here, with teams where shots are relatively uniformly distributed (e.g. Malaga) and other teams where this is not the case (e.g. Barcelona, Espanyol, etc). At first glance, it may look like Real Madrid are in a similar situation as Barcelona, with Cristiano Ronaldo taking much more shots than his teammates, but unlike Barcelona (where except Messi and Suárez, Paulinho is the only player with considerable input) there are many players at Real Madrid who share between 5-10% of the team’s shots each.

shots

Fig. 2

The graph in Fig.3 is similar to the one in Fig.2 but instead of shots it shows the distribution of chances created. Here the situation gets more dramatic for Barça, with Messi creating almost 25% of the team chances, which is at least the double of any other Barça player). With the exception of Las Palmas’ Jonathan Viera (who now is in China) no other team rely on a single player, in terms of chances created, as much as Barça rely on Messi. Real Madrid and Atlético Madrid have a considerably smoother distribution.

chances

Fig. 3

It’s not characteristic for a team that the same player who dominates in shots taken also dominates in chances created, particularly when the difference from other teammates has a considerable margin, as in Barça’s case with Messi. The consequences (either positive or negative) of this over-reliance on Messi are difficult to be foreseen but most Barça fans are not very optimistic (Surprise!!). It’s interesting to see if Valverde will have enough time to address this “issue”, considering that we are entering the final phase of the season and as he said, there is little room for experiment.

This article was written with the aid of StrataData, which is property of Stratagem Technologies. StrataData powers the StrataBet Sports Trading Platform, in addition to StrataBet Premium Recommendations.

I analyzed Messi’s shots with Barcelona, trying to see what (if anything) is different from one season to the other, mainly focusing in the last three seasons. A map of his shots (penalties excluded) for each of the 2014/15, 2015/16 and 2016/17 seasons is shown in the following image:

1

Figure 1 – Map of Messi’s shots with Barcelona (2014/15 to 2016/17)

After an initial and general observation, these maps look very similar to each other (as they probably should… we are talking about the same player) and the preferred shooting zones are basically the same.  If we have a closer look at the distribution of these shots within these zones we might start to notice differences between one season and another. Maybe the main difference is the intensity of frontal shots and shots made within the penalty area, which seems to be higher in the 2014/15 season compared to 2015/16 and 2016/17. To see what these differences are I built histograms of how Messi’s shots are distributed according to the distance from the goal (Figure 2).

2

Figure 2 – Distribution of Messi’s shots with Barcelona according to the distance from the goal (2014/15 to 2016/17).

These histograms may be helpful in comparing shots from one season to the other. We can notice, for example, that the percentage of shots taken within the distance of 10m (or 15m) is considerably higher in 2014/15 than in 2016/17. The reverse is true for long-range shots.

The data presented above can serve as indicators about the quality of shots taken and about the possibility for Messi to penetrate through defenses (by means of dribbling and combinations with team mates) and to make short-range shots, which of course have a much higher probability to be converted into goals. For instance, Messi converts into goals 42.5% of the shots within a distance of 10m, 29.2% of the shots within a distance of 15m and only 9.5% of the shots beyond 15m.

Now, talking about comparing shooting ranges, I built some shooting-range distribution curves (If you think of a better name…please suggest :D) which I think are clearer than histograms, especially when you want to compare three or more series of data (i.e. three or more players, various seasons of a certain player or other combinations).

3

Figure 3 – Shooting-range distribution curves for Messi, Cristiano, Suárez and Benzema.

These curves are a visual representation of how a player’s shots are distributed according to the distance from the goal. The horizontal axis show the distance from the goal while the quantity of the shots taken within a specific distance is shown by the cumulative percentage in the vertical axis. Curves in the upper-left part of the graph area (Figure 3) indicate players that take relatively more short-range shots (e.g. Benzema, Suárez)  and curves in the lower-right part of the graph area indicate players that take relatively more long-range shots (e.g. Cristiano). Messi’s curve lies somewhere between Suárez and Cristiano, indicating that he takes less long-range shots than Cristiano but also less short-range shots than Suárez.

Messi’s shooting-range distribution curves for the last three seasons and for his career are shown in the following image:

4

Figure 4 – Shooting-range distribution curves for Messi (2014/15 to 2016/17 and career).

As we can see, in the 2014/15 season Messi took more short-range shots, not only compared to 2015/16 and 2016/17 but also compared to his entire career average curve. The differences between various seasons become more perceptible if we consider separately a part of the above graph. For example, Figures 5 and 6 show Messi’s shooting-range distribution curves for distances up to respectively 10m and 15m.

5

Figure 5 – Shooting-range distribution curves for Messi (2014/15 to 2016/17 and career) up to 10m.

6

Figure 6 – Shooting-range distribution curves for Messi (2014/15 to 2016/17 and career) up to 15m.

The above highlighted difference between the considered seasons is probably one of the various factors that prevented Barcelona from repeating the outcomes of 2014/15. It is clear that creating the possibility to take a lot of short-range shots doesn’t solely depend on the player who is shooting (his ability to dribble) but also on the help he gets from his team mates and the opponent’s approach, his positioning, etc. Hence, although they provide some good information, shooting-range distribution curves should be cautiously interpreted in order to avoid reaching premature conclusions.

I prepared an analysis or evaluation on the most well-rounded (i.e. complete) attacking players in the European top 5 leagues for the 2016/17 season. The used approach is the same as the one I have used in this piece about Messi. Please refer to that text, as I am going to explain just the main points here.

I have considered all those players with 10+ league goals in the 2016/17 season, 115 players in total. These players are compared to each other in terms of four aspects of the attacking play: goals scored, successful dribbles, chances created, and chances created by through balls. The main idea behind this is “well-rounded attackers are considered those players who perform relatively good in all the considered aspects”.

The range of values that these 115 players have is as below:

range-of-values

Fig 1 – Range of values in goals, dribbles, chances created and chances created by through balls for the 115 players with 10+ league goals in 2016/17 (top 5 leagues).

As in the above cited article, I used the percentile rank in order to “bring these four metrics at the same scale”. The higher the percentile rank, the better (and vice-versa). Below I am posting graphs of various players, so we may get an idea how they compare against each other and against all the 115 considered players.

messi-vs-all

Fig 2 – Messi is (obviously) the most complete attacker for 2016/17 (look at his numbers!).

messi-and-other-top-4

Fig 3 – Alexis Sánchez, Dries Mertens and Keita Baldé are the closest players to Messi.

messi-neymar-cristiano

Fig 4 – Neymar and Cristiano Ronaldo compared to Messi. It looks like if you add Cristiano’s goal-scoring numbers to Neymar’s dribbling and play-making numbers, you get someone close to Messi.

msn

Fig 5 – Messi, Suárez and Neymar for 2016/17.

ney-dybala-coutinho

Fig 6 – Neymar, Dybala and Coutinho seem to have some very similar numbers.

morata-kane-lewa-werner

Fig 7 – Morata, Lewandowski, Kane and Werner.

aguero-lukaku-costa

Fig 8 – Agüero, Lukaku and Diego Costa.

mbappe-lacazette-boudebouz

Fig 9 – Lacazette, Mbappé and Boudebouz.

 

 

 

 

%d bloggers like this: