Back to Projects

From Rankings to Tiers: NFL Team Clustering

Applying Cut Imbalance Clustering to NFL Team Performance Analysis

2015–2025 Season Data | nflverse Play-by-Play Statistics

View Player Tiering (QBs, WRs, CBs)

Research Methodology

Why Tiers Instead of Rankings?

Traditional rankings assign a unique position to each item, but this level of granularity often overstates our ability to distinguish between similar items. When comparing NFL teams, is the difference between the 8th and 9th ranked team meaningful, or are they essentially equivalent?

Key Insight: Tiering provides a more honest representation of comparative relationships. Teams within the same tier are considered statistically indistinguishable, while teams in different tiers show clear performance gaps.

This approach is particularly valuable in sports analytics where noise, variance, and sample size limitations make precise ordinal rankings unreliable. A tier-based system acknowledges uncertainty while still providing actionable insights about team quality.

Cut Imbalance Clustering

Cut imbalance clustering is a method for partitioning items into hierarchical tiers based on pairwise preference data. The core idea comes from analyzing the "cuts" between adjacent tiers in a partition.

The Intuition

Consider a partition of teams into tiers. For any two tiers, we can examine all pairwise comparisons between teams across those tiers. If teams are correctly grouped:

  • Within a tier: Comparisons should be roughly balanced (teams are similar)
  • Across tiers: Comparisons should be imbalanced (higher tier teams dominate)

Cut Imbalance Definition: The cut imbalance between two tiers measures the net preference of the higher tier over the lower tier. A high cut imbalance indicates a clear separation between tiers.

Cumulative Cut Imbalance

For a partition into K tiers, the cumulative cut imbalance sums the imbalances across all pairs of tiers. This serves as our objective function—we seek the partition that maximizes cumulative cut imbalance.

CI = Σk<l Σi∈Ck, j∈Cl (wij - wji)

Where wij is the normalized preference of team i over team j, and Ck denotes the set of teams in tier k (with lower indices being higher/better tiers).

Preference Matrix Construction

The foundation of our analysis is a preference matrix W where entry wij represents how strongly team i is preferred over team j based on statistical comparisons.

Pairwise Stat Comparisons

For each pair of teams, we compare them across 10 key statistics. The raw preference count is simply the number of stats where team i outperforms team j. We then normalize so that wij + wji = 1 for all pairs.

Offensive Statistics (10)
Points Scored Total Yards Passing Yards Rushing Yards Yards Per Play Completion % Total TDs Turnovers (lower) 3rd Down % Red Zone TD %
Defensive Statistics (10)
Points Allowed Yards Allowed Pass Yds Allowed Rush Yds Allowed Yds/Play Allowed TDs Allowed Turnovers Forced Sacks 3rd Down % Allowed Red Zone % Allowed

Red items indicate "lower is better" statistics where the comparison direction is reversed.

Normalization

Raw preference counts are normalized to ensure wij + wji = 1:

wij = (raw count i beats j) / (raw count i beats j + raw count j beats i)

This normalization ensures the preference matrix captures relative strength. A value of wij = 0.7 means team i beats team j on 70% of their head-to-head stat comparisons.

The PNoRanking Formulation

A key contribution of this research is the PNoRanking (Partition with No Ranking) formulation, which finds optimal tiers without requiring an initial ranking of items.

Why This Matters: Traditional tiering methods first rank items, then partition that ranking. PNoRanking jointly determines both the tier assignments and the implicit tier ordering, avoiding bias from a potentially suboptimal initial ranking.

Mixed Integer Linear Program (MILP)

The optimal clustering can be found exactly via a MILP formulation:

maximize: (1/2) * Σk<l Σi≠j (wij - wji) * zijkl subject to: Σk xik = 1 ∀ i (each team in exactly one tier) zijkl ≤ xik ∀ i,j,k,l (linearization) zijkl ≤ xjl ∀ i,j,k,l zijkl ≥ xik + xjl - 1 ∀ i,j,k,l xik ∈ {0,1} zijkl ∈ {0,1}

Where xik = 1 if team i is assigned to tier k, and zijkl = 1 if team i is in tier k AND team j is in tier l.

Coupled Probability Updating (Heuristic)

For larger instances where the MILP becomes computationally expensive, we use a Coupled Probability Updating algorithm that iteratively refines soft cluster assignments using simulated annealing:

  1. Initialize uniform probability distributions over tiers for each team
  2. Compute contribution scores based on expected cut imbalance gains
  3. Update probabilities via softmax with temperature annealing
  4. Apply momentum to smooth updates and avoid oscillation
  5. Periodically extract integer solutions and track the best

Selecting the Number of Tiers (K)

A key question in any clustering problem is: how many clusters (tiers) should we use? The choice of K significantly impacts the results.

Too Few Tiers

If K is too small, we force dissimilar teams into the same tier, reducing within-tier balance and leaving cut imbalance "on the table."

Too Many Tiers

If K is too large, tiers become too granular. We may separate teams that are statistically indistinguishable, and tiers may be nearly empty.

In This NFL Example

For this demonstration, we use a simplified approach: we test several values of K (from 2 to 6), solve the clustering problem for each, and select the K that maximizes cumulative cut imbalance. This grid search approach is practical for quick analyses and works well when the range of reasonable K values is small.

For NFL seasons, we use K = 5 tiers for both offense and defense to stratify team performance.

In the Research Paper: The full paper presents a more rigorous approach that optimizes over all possible numbers of tiers simultaneously. Rather than fixing K and solving, we formulate an extended model where K itself becomes a decision variable. This allows the algorithm to automatically discover the optimal number of tiers without requiring a grid search, and provides theoretical guarantees about the solution quality.

Interpreting the Results

The resulting tiers should be interpreted as follows:

  • Teams in the same tier are statistically similar—pairwise comparisons between them are relatively balanced
  • Teams in different tiers show clear separation—higher tier teams consistently outperform lower tier teams across the stat comparisons
  • Tier ordering is determined by dominance—Tier 1 teams dominate all other tiers, Tier 2 teams dominate Tiers 3+, etc.

Note: Offense and defense are tiered separately because a team's offensive strength may differ significantly from its defensive strength.

Season Tier Results

Teams clustered into tiers based on offensive and defensive performance across the season.

Season:

Offensive Tiers

Defensive Tiers

Tier Progression

How each team's tier has changed across seasons (2015–2025).

Optimal Tier Discovery

Rather than fixing K=5, this formulation sets K to an upper bound and lets the optimizer discover the natural number of tiers. Empty tiers are removed after solving.

How it works: By setting K to an upper bound (10), the MILP is free to leave tiers empty. The solver maximizes cumulative cut imbalance, and the number of non-empty tiers in the solution is the optimal K. This avoids the need for a grid search over K values.

Optimal K by Season

Number of tiers discovered by the optimizer for each season

Season:

Optimal Offensive Tiers

Optimal Defensive Tiers

Preference Matrices

Explore the raw pairwise preference data underlying the tier assignments.

Offensive Preference Matrix

Entry (i, j) = number of offensive stats where row team i outperforms column team j (out of 10)

Defensive Preference Matrix

Entry (i, j) = number of defensive stats where row team i outperforms column team j (out of 10)