Should tennis players strive to serve fewer points?

Last month Craig O’Shannessy published an article about how the top players on the ATP Tour play more points served by their opponents than on their own serve. The article claims that the best and most experienced players are more efficient with their serve and developing players should strive to emulate that.

A conversation with Damien Saunder around how a coach should react to this article, and this sort of advice in general, got me thinking.

I have a lot of respect for O’Shannessy (no relation) in general, and he is regarded as the ATP’s stats guru with a terrific track record of popularising the available data. But this article displays the very common fallacy of mistaking effect for cause, something that leads many coaches to chase wild geese.

Let’s establish the facts first, from some basic mathematics. A top player generally wins tennis matches because he wins more points than his opponent (modulo nesting). As players alternate serve, this is equivalent* to saying that he wins a higher percentage of points on his serve than his opponent does in the opponent’s service games.

The other thing that elite men’s tennis has is a serve advantage. Players tend to win about 64% of their points on serve against opponents of similar strength, which leads to about 81% of service games won, broadly compatible with an independent and identically distributed IID† points assumption. This varies by surface and individual style.

This means that inferior players have winning point percentages on serve closer to 50% than their opponents. This naturally leads to more close games. And closer games under the rules of tennis have more points as they go to deuce and a situation where either player needs to win by two points. Thus, the inferior player has to serve more points. No magic mental games required.

Have a look at the outliers in the article. Wawrinka is low because his ranking is inflated from the U.S. Open win and his overall point winning ratio is not as good as the others in the Top 10. The young players mentioned in the article like Kyrgios and Pouille are also low, because the 20-month sample period includes a time when they were even younger and not top 20 quality. Federer is high because his injury has prevented him earning points; when he was on the court he was elite. In other words, there is a tight correlation between the basic percentage of points won and this new statistic.

Imagine a different sport — like volleyball — where the team on serve has a distinct disadvantage at elite level. If the scoring system was like tennis, you would see the best teams play more points on their serve just because they are the more competitive situations. It’s nothing to do with trying to keep the pressure on their opponents, it’s just a result of the scoring system.

The lesson I would take from this case study is to consciously distance yourself from analysing minute variations in outcomes. It can get to be like reading tea leaves. You cannot coach an outcome, only adaptive processes that produce the ones you want more often than not. Don’t try to coach the KPI as making your opponent’s service games longer, glance at it as an imperfect indicator of a better player.


* It’s arguable that I’m defining the statistic out of existence here, so let’s look at an extreme example. Imagine a typical close match of 150 points, where Player A wins 63% of points on serve compared to Player B’s 60% on his serve. If they had served 75 points each (50%/50%), A would have won 47 points on serve and 30 on return. That’s 77 points to 73. If B had played longer service games, let’s say 84 points to 66 on A’s serve, that’s a massive 56%/44% split of service activity which is well beyond the bounds of the data O’Shannessy showed. It’s like one player averaging 7 points per service game (plenty of deuces) compared to 5½ per service game (win to 15 or 30). It’s almost physically impossible to get that discrepancy with this mix of service point win percentages. Yet Player A would still have won 75 points, a reduction of only two. The point is: a basic stochastic process with service win% as the only input (pair) explains all the variation in outcomes.

† While the IID assumption makes for an easy modelling process, with enough data we see that players don’t follow it exactly throughout matches and there is more autocorrelation than a truly random process. That effect of a combination of mental & physical performance is for another post.

Statistician vs Analyst Conversation

A lot of people want to get into the sports analytics industry, but it’s a long row to hoe from a traditional training in statistics to being a productive member of a sporting club. Employment paths for statisticians and data scientists traditionally cover careers like finance, medical research, and marketing. Sports data is different: a lot of it comes from adversarial situations with continual adjustment of environment. A successful path involves a complex network of players, coaches, sports scientists, opponents, plans and counter-plans.

Here’s the type of conversation that I hear between beginner sports statisticians (S) and experienced analysts / coaches (C). We all have to learn about the importance of context.

S: We’ve had a pretty good season, but our pass completion rate is in the bottom 20% of teams. I’ve done a regression and if we just improved that stat by 2% we would be the best team in the league.
C: Let me have a look at that data. Being a good team means that we play less in our defensive half, where it’s easier to complete a pass. If you adjust for that, I bet we look better.
S next day: OK, that made some difference. But when I isolate just passes in our defensive zone, we’re still below average. In midfield we’re well below average for passes that find a target. We have to fix this!
C: But we encourage our players to take risks. As long as they are making good decisions about the type of pass that might lose possession, we come out ahead despite the raw success ratio being low. Have a look at whether our completed midfield passes lead to more attacks.
S next week: it took a while but I filtered down to just our successful midfield passes. We’re still only a touch above average using a metric of goals per chain from a completed midfield pass.
C: Did you correct for expected goals?
S: Huh?
C: We’re getting to a smaller sample if you’re looking at just goals. Get a more reliable measure of attacking quality by looking at the expected number of goals from those opportunities.
S next fortnight: YOU WERE WRONG OLD MAN! I adapted an Expected Goals formula for our data and we get about the number we expected. We MUST complete more passes coming through midfield to set up goals.
C: What did you do with the turnover data?
S: We already know we’re turning over too many passes, stop changing the subject.
C: I mean, what happens to the ball when we don’t complete the pass? It goes into dispute, or the opponent gets clean possession. Have a look at those chains of play.
S mutters under breath
S next month: Hey I’ve got something interesting. Did you know that when we lose the ball passing forward in midfield, our opponents hardly ever score on the counter-attack? Our equity* from those plays is the best in the league.
C: Yeah, makes sense. We’ve designed our offensive structure with men covering the most productive routes out of defence, and we train them to anticipate the turnover. We don’t over-commit to speculative attacks.
S: Why didn’t you just say that two months ago? Oh wait … how do I categorise defensive structures from our crappy tracking data?
C: Now you’re thinking like an analyst, not just a statistician.

* Equity = net expected score from the situation. Adopted from backgammon theory

DFL-Δ3 JV is hiring

The Deutsche Fußball Liga (DFL / German Football League) has entered into a joint venture with sports data producer deltratre to service the German professional soccer industry. The JV — Sportec Solutions GmbH — headquartered in Köln (Cologne) is now hiring, and the job descriptions give some idea of how much of a landmark this enterprise could be. See the positions under the Sportec heading.

I’ve had the opportunity to speak with Dr Daniel Link at conferences over the past few years. Daniel is a serious researcher (in that very German way) who is also responsible for writing and maintaining the documents that describe how data should be recorded in soccer. These are official documents of the DFL that each data provider must follow in order to provide consistent definitions of Zweikämpfe (duels / one-on-one contests), Torversuche (goal attempts), and everything else you want to observe about performance in the sport.

It is no accident that Germany won its fourth World Cup in Brazil 2014. The culture of analysing football in Germany would be foreign to most other national sporting organisations, based on evidence and theories of the game that are both well-tested and innovative. In some professional clubs here, I am seeing the same type of culture leading to success. It’s not so much the data methodology or analytical techniques that matter, it’s about being able to question practices and approach the answers with confidence that the majority of coaches are open to having a dialogue about evidence and its meanings.

If you’re interested in moving to Köln, I would highly recommend these opportunities. Although it would be a distinct advantage to speak or at least read German, it is not required.
Here’s a translation of some of the key points in each job:

Director Operations & IT
In your kit-bag, you have a degree in computer science or engineering and operational expert knowledge in sports media or a club environment
Head of IT Entwicklung (Development)
Manage the IT Development department using agile methods; be the “champion” of IT issues; experience in real-time databases
Manager Tracking
Responsible for the management of “tracking” for the DFL, i.e. collection, validation, and processing of position tracking data
(Senior) Produkt Manager
Both customer-facing, and responsible for design and specification of products and projects
IT Operations Manager
This one is less about football specifically: manage the entire IT infrastructure including live delivery

I’d also encourage other sporting leagues to take their data responsibilities in-house the way the DFL has, and resource it properly. The data analytics community is watching!