tournament design – A Simulation by Athletes

This post was sparked by a tweet from Omar Chaudhuri, regarding changes to Austrian soccer in 2018/19:

Another league going for the split season format, where it’s preferable to finish 7^th over 6^th. This has to stop. https://t.co/o8dmcPyp24

— Omar Chaudhuri (@OmarChaudhuri) December 2, 2016

Österreich Fußball Bundesliga 2018/19

I thought I should test this assertion, and found it is true for a range of common scenarios, although the trade-offs are considerable. The results of 40 million league simulations are below.

Background: What do professional leagues want?

We are seeing a growing trend of sporting bodies adjusting their schedules to maximise broadcasting revenue. From the NRL leaving its fixture floating so Channel 9 can put the best teams in the prime timeslots, to MLB introducing wildcard playoffs, to cricket ensuring India and Pakistan are always in the same group, there is a substantial buck to be made by tweaking fixtures. Marketing managers are paid extremely well to dream up ideas that spread games across more eyes and grow the sale price for rights.

In many cases, this aligns with the fan who sees more televised sport and the excitement of his/her team being in contention for longer. But deviating from the pure formats of round-robin* or straight knockout brings the risk of what economists call perverse incentives.

We have already had many situations where both teams would be happy to settle for a draw to qualify for the next stage of a tournament at the expense of a third. Even worse is when either team would achieve a better outcome by losing a particular match. This year, Graham Kendell and Liam Lenten published a paper with a collection of farcical cases called When Sports Rules Go Awry in the European Journal of Operational Research. As new formats are brought into national leagues, experts must test them as thoroughly as Durex for any holes that lead to unwanted outcomes. Stress-testing fixtures for potential unwanted outcomes

Let’s be clear: despite exhortations to play your best, if the tournament structure favours a loss it is rational to aim for it. It just feels and looks terrible, so people get disqualified for it after the fact. It behoves sporting bodies to carefully design their tournament formats with this in mind. How do you blame the competitors when it is the governance that failed?

The new Österreich Fußball-Bundesliga format

In 2018/19, the top tier of the ÖFBL (Austrian Football League) grows from ten to twelve teams, and in an effort to increase interest for mid-ranked teams the governing body has decided to split them after a double round-robin (Rounds 1-22). Points from those 22 matches will be halved. The top six will then play amongst each other twice, while the bottom six do the same (Rounds 23-32). The twist is that those bottom six remain in contention for a Europa League playoff berth, and all the money that could bring. Even better for the ÖFBL, this creates three end-of-season playoff matches that should attract eyeballs and euros.

The first problem is that Austria is a moderately weak league (ranked 16^th in Europe), recently dominated by Red Bull Salzburg (seven of the past ten titles) and the two large Vienna clubs (Austria Wien & Rapid Wien). On current allocations, the league receives just one Champions League ticket, and two† for the lesser Europa League (nominally one into the Second Qualifying Round, and one into the First Qualifying Round).

Under the new system, the team that finishes “seventh” — i.e. best of the lower half (Q1) — will host the team that finishes fourth (M4) in a single playoff. The winner of that will play home & away against the third-placed team (M3) for the last Euro spot.

The ÖFBL did receive outside help to construct and test its new system from Hypercube, based in the Netherlands, who have also consulted to other national leagues.

Benefits

Teams in the bottom half are playing for something until later in the season
More matches between two good teams
More projected revenue from attendance and rights (I assume)
As a whole, there is still a sensible payoff structure with respect to the strength of each team, compared to round-robin
Halving the points means that later games matter more, freshening the incentive
The single playoff hosted by the best of the bottom half will often be a 50/50 game, as the home ground advantage cancels out the skill advantage of the fourth-placed team

This is not repechage

Repechage is a very appealing concept from a tournament design point of view. If you have an initial short classification stage, sometimes good competitors will fail to win through just by luck. Giving them a (tough) route to remain in contention for the trophy can be an excellent way to maintain interest while keeping the payoffs fair and incentives clear. I’ll revisit my 2010 World Cup proposal some time soon, as an example.

I would not call the ÖFBL proposal repechage. The teams have a full double round-robin, which will usually provide enough resolution to the order. The number of cases where team 7 is actually better than team 5 (or 4) starts to diminish. Why give them a leg up?

On the flipside, team 7 is now excluded from winning the title or indeed finishing in the top three. A top team that somehow found itself 7^th after two round-robins would have made the top three 40% of the time after the addition of a third full round-robin.

Methodology

To simulate the plan, I had to generate plausible team strengths and compare the ÖFBL schedule with the alternative. The current ten-team league has a quadruple round-robin of 36 matches, impossible with 12 teams. So I went for a triple round-robin over 33 rounds as the comparison, alternating the extra home game from simulation to simulation. I did not halve the first-stage points in this scenario.

Team strengths were drawn from a Skew Normal Distribution, which can mimic a situation where you have the upper tail of a natural (Normal) distribution with a mask (Normal CDF) indicating that lesser teams have been relegated or otherwise sit in a lower division. I used the observed average home (1.58) and away (1.30) goals to fit the distribution, and checked that the simulated results (via Poisson random variates, Mersenne Twister algorithm) passed the sniff test for how often the top teams won.

Each set of random team strengths was used for 1000 simulated leagues, then a new bunch was drawn. In each set, the team with the highest strength was labelled A, the next highest B, all the way down to the weakest team L. 20,000 different sets were used for each type of fixture, meaning 20,000,000 full simulations of each type.

What went wrong?

This is the plainest way to see the thing that makes people uneasy: team 7 has a higher chance of reaching a playoff than team 6. In fact, it’s nearly as high as the chances of team 5. Post-season (as the Americans like to call it) is a clear marker of success, and the team that misses the elite group cut is better placed to achieve it.

But let’s consider the value of these playoff positions in terms of the likelihood of reaching the money rounds in Europe. Q1 will play M4, the winner of that playing M3 for the Europa League position. That position will be the lowest ranked of the Austrian Europa teams, and play in the 1^st or 2^nd Qualifying Round, needing a few wins over European rivals of similar strength to make the richer group stages. Meanwhile, M2 is in the 2^nd Qualifying Round at least. Of course, the champion M1 has a Champions League chance, worth even more.

Let’s make simple metric that says a Champions League berth is worth 8 points, double that of the Europa League: 4 points. Reaching the ÖFBL two-legged playoff is then worth 2 points, and the Q1 vs M4 match-up is worth 1 point to each competitor. The next graph shows the average value for a team in each position after Round 22.

The design looks to have achieved what it set out to do: a smooth decrease in the value of the position after Round 22. The team that has dominated the first two-thirds of the season is going to have to work harder as its earned points are halved, and there is more chance of the bottom eight moving into contention as planned.

Looking at teams labelled A because their intrinsic strength was the highest of the 12 teams, they win the trophy 59% of the time under the ÖFBL model and 60% under 3RR. In fact after 20,000,000 league simulations of each type, the profile of each labelled team finishing in each position is so similar as to make no difference. The Meistergruppe double round-robin does a decent job of bringing the cream to the top. There are quirks: for instance, the sixth-strongest team F‘s most likely final position is not M6 (14.6%), but the positions on either side: M5 (14.7%) or Q1 (14.7%). Nothing too serious.

Now, like the Durex factory workers, we’re going to do some stress-testing. Does this new model work well for every team and scenario, or just on average?

	6^th after Round 22		7^th after Round 22
	Playoff or better	Avg Metric	Playoff or better	Avg Metric
Σ	31%	0.63	41%	0.41
A	74%	2.60	75%	0.75
B	65%	1.73	69%	0.69
C	55%	1.21	64%	0.64
D	44%	0.87	59%	0.59
E	34%	0.64	54%	0.54
F	27%	0.486	49%	0.490
G	22%	0.39	43%	0.43
H	19%	0.32	34%	0.34
I	16%	0.26	27%	0.27
J	13%	0.21	21%	0.21
K	10%	0.16	16%	0.16
L	8%	0.11	11%	0.11

As we established earlier, the average 6^th-placed team has less chance (31%) of making the Austrian playoffs than the average 7^th-placed team (41%), but the net value of their playoff position will be greater (0.63 vs 0.41). For strong teams, there are clear benefits in finishing in the top half and competing for the top three. The trouble comes with teams F, G, H, & I: in each case, not only is the chance of a playoff nearly doubled by dropping into the bottom half, but the simple metric we calculated shows they have a slight benefit in doing so.

In other words, for the teams that are most likely to be on the cusp of qualification, it is better to play ten matches against weak opponents and hope for Q1 than play ten matches against strong opponents attempting to land in M4 or higher. If the team is a long way behind the top four on points, it has extra incentive to tank. The only clear reason for aiming for 6 over 7 would be if a strong team or two has accidentally underperformed into the bottom half.

We predict teams with sufficient mathematical nous will be confronted with this scenario some time in the next few seasons, and the ÖFBL’s governance will come under heavy fire. Nice try, it just needs a tweak before that happens.

* Round-robin can still have opponents with differing enthusiasm for a win, especially with potential draft order at stake. It’s possible that an equivalent of the Condorcet Paradox or even Arrow’s Impossibility Theorem exists in fixturing: if there are competing priorities, there may be no system that eliminates perverse incentives in some circumstances, and all we can do is lessen them.

† There are three Europa League slots, but one is given to the Austrian Cup winner. Also, if Austria climbs to 15^th in the UEFA coefficient, it will gain a second Champions League berth. This makes qualification from fifth (M5) possible, which would imply 6 is preferable to 7 in all cases.

A full 17-match Round Robin. A guaranteed return Derby match in Round 18. Then four matches to complete the schedule, carefully chosen to achieve a proper handicap. What’s not to like?

What Does 17-D-4 Do Better than the Alternatives?

Alternative	17-D-4
Current Fixture
Return matches based on last year’s ladder	Return matches based on this year’s ladder (actual strength)
Clubs grouped by six, #7 gets much easier draw than #6	Toughness of draw scales with the strength of the team, no arbitrary blocks
Everyone gets a bye the week before the finals to discourage resting of star players	Everyone gets a bye in the last five weeks before finals; the best teams get it later
Prime timeslots late in the season contain mismatches	Schedule them when you know who is actually good
Proposed 6-6-6
Top six after 17 weeks guaranteed a home final	Every position open until the last match
Local rivalries not played twice	All rivalries played in the special Derby Round
The last five rounds have to be fixtured in a hurry	There is an extra week during Derby Round, so you can take a couple of days to get it right
Home-Away balance not achievable unless all blocks have exactly three teams who have played 9 Home / 8 Away	Every team gets 9 Home / 9 Away in the first 18, then 2 Home / 2 Away selected from unplayed return legs

And What Won’t it Solve?

Tanking: For that, use a more targeted points-based draft system
Some matches being more important than others: Any finals system is vulnerable to this. In the AFL, there are sharp divisions between 8 & 9, and between 4 & 5 that have huge rewards.
Some teams being crap: Mismatches will happen. There will be fewer under 17-D-4 because teams play their return matches against teams of similar strength

2016 Example

Let’s pretend that each team has played a single round-robin. In Round 18 (or Round 19 with the current counting for the early bye), each team plays its local rival. If they don’t have one, we’ll make them up for now. Clubs would have some say in this.

We assume that a good estimate of a team’s strength is the number of games it has won out of those 17. Looking at 2016’s ladder after 17 matches:

Team	Won	Points	Target
Hawthorn	14	56	203
GWS	12	48	191
Sydney	12	48	191
Geelong	12	48	191
WC Eagles	12	48	191
Adelaide	12	48	191
W Bulldogs	12	48	191
North Melb	11	44	185
St Kilda	9	36	173
Port Adel	8	32	167
Melbourne	7	28	161
Collingwood	7	28	161
Richmond	7	28	161
Carlton	6	24	155
Gold Coast	6	24	155
Fremantle	3	12	137
Bris Lions	2	8	131
Essendon	1	4	125

That Target is the combined number of points that we want the team’s last five opponents to sum to — including their Derby rival. The average team has scored 34 points, so the average five-week schedule comes to 170 points. I’ve used a scaling factor of 1.5 to give stronger teams stronger opponents, but that number is malleable. Note: a perfectly fair draw would give the bottom teams a higher Target than the top teams, to account for self-reference. That would be a scaling factor of -1.0.

I’ve written a tree search program that finds close fits to the target totals, choosing matches from the return legs that have not yet been played. In the fixture below, every team is within 13 points of the target opponent strength (or an average of 2.6 per opponent, less than one win). For instance, Hawthorn’s opponents would be Geelong (already fixtured, 48) + West Coast (48) + Sydney (48) + Carlton (24) + St Kilda (36) for a total of 204, compared with a target of 203.

Team	Target	Actual	Rival	Other Opponents
Haw	203	204	Geel	WCE, Syd, Carl, St.K
GWS	191	192	Syd	Ess, N.M., W.B., WCE
Syd	191	204	GWS	Geel, G.C., Melb, Haw
Geel	191	192	Haw	B.L., P.A., Syd, Adel
WCE	191	192	Freo	W.B., GWS, Rich, Haw
Adel	191	180	P.A.	Geel, Rich, Coll, N.M.
W.B.	191	188	St.K	P.A., GWS, G.C., WCE
N.M.	185	188	Melb	Adel, Coll, St.K, GWS
St.K	173	164	W.B.	Haw, N.M., Freo, Ess
P.A.	167	164	Adel	B.L., Freo, Geel, W.B.
Melb	161	168	N.M.	Carl, Syd, G.C., Rich
Coll	161	156	Carl	Adel, Rich, Freo, N.M.
Rich	161	156	Ess	Melb, WCE, Coll, Adel
Carl	155	144	Coll	Haw, G.C., B.L., Melb
G.C.	155	156	B.L.	Melb, W.B., Carl, Syd
Freo	137	148	WCE	Coll, St.K, P.A., Ess
B.L.	131	132	G.C.	Carl, Ess, P.A., Geel
Ess	125	132	Rich	Freo, St.K, GWS, B.L.

We can then shuffle these 36 matches across five weeks, giving the top four a bye in the last round, and the next four a bye in the second-last round.

Syd v Geel, WCE v GWS, G.C. v W.B., Carl v Haw, N.M. v Adel, P.A. v B.L., Freo v St.K, Rich v Melb (Ess, Coll byes)
GWS v N.M., WCE v W.B., Haw v Syd, Geel v P.A., Adel v Rich, Ess v St.K, Freo v Coll (B.L., G.C., Carl, Melb byes)
W.B. v GWS, Haw v WCE, Syd v G.C., B.L. v Ess, Adel v Geel, N.M. v Coll, Melb v Carl (Freo, P.A., St.K, Rich byes)
St.K v Haw, Coll v Rich, GWS v Ess, Melb v Syd, Carl v G.C., P.A. v Freo, Geel v B.L. (Adel, WCE, W.B., N.M. byes)
Rich v WCE, W.B. v P.A., Coll v Adel, St.K v N.M., G.C. v Melb, B.L. v Carl, Ess v Freo (Haw, GWS, Geel, Syd byes)

I noticed after I’d done this that I’d forgotten to enforce the constraint of Fremantle and West Coast not both playing at home in the same week, so I’ll leave that as an exercise for the reader.

Feedback welcome here and on Twitter.

Tag: tournament design

Schedules are getting better for broadcasters, at the expense of governance