Research Methodology
Why Tiers Instead of Rankings?
Traditional rankings assign a unique position to each item, but this level of granularity often overstates our ability to distinguish between similar items. When comparing NFL teams, is the difference between the 8th and 9th ranked team meaningful, or are they essentially equivalent?
Key Insight: Tiering provides a more honest representation of comparative relationships. Teams within the same tier are considered statistically indistinguishable, while teams in different tiers show clear performance gaps.
This approach is particularly valuable in sports analytics where noise, variance, and sample size limitations make precise ordinal rankings unreliable. A tier-based system acknowledges uncertainty while still providing actionable insights about team quality.
Cut Imbalance Clustering
Cut imbalance clustering is a method for partitioning items into hierarchical tiers based on pairwise preference data. The core idea comes from analyzing the "cuts" between adjacent tiers in a partition.
The Intuition
Consider a partition of teams into tiers. For any two tiers, we can examine all pairwise comparisons between teams across those tiers. If teams are correctly grouped:
- Within a tier: Comparisons should be roughly balanced (teams are similar)
- Across tiers: Comparisons should be imbalanced (higher tier teams dominate)
Cut Imbalance Definition: The cut imbalance between two tiers measures the net preference of the higher tier over the lower tier. A high cut imbalance indicates a clear separation between tiers.
Cumulative Cut Imbalance
For a partition into K tiers, the cumulative cut imbalance sums the imbalances across all pairs of tiers. This serves as our objective function—we seek the partition that maximizes cumulative cut imbalance.
Where wij is the normalized preference of team i over team j, and Ck denotes the set of teams in tier k (with lower indices being higher/better tiers).
Preference Matrix Construction
The foundation of our analysis is a preference matrix W where entry wij represents how strongly team i is preferred over team j based on statistical comparisons.
Pairwise Stat Comparisons
For each pair of teams, we compare them across 10 key statistics. The raw preference count is simply the number of stats where team i outperforms team j. We then normalize so that wij + wji = 1 for all pairs.
Offensive Statistics (10)
Defensive Statistics (10)
Red items indicate "lower is better" statistics where the comparison direction is reversed.
Normalization
Raw preference counts are normalized to ensure wij + wji = 1:
This normalization ensures the preference matrix captures relative strength. A value of wij = 0.7 means team i beats team j on 70% of their head-to-head stat comparisons.
The PNoRanking Formulation
A key contribution of this research is the PNoRanking (Partition with No Ranking) formulation, which finds optimal tiers without requiring an initial ranking of items.
Why This Matters: Traditional tiering methods first rank items, then partition that ranking. PNoRanking jointly determines both the tier assignments and the implicit tier ordering, avoiding bias from a potentially suboptimal initial ranking.
Mixed Integer Linear Program (MILP)
The optimal clustering can be found exactly via a MILP formulation:
maximize: (1/2) * Σk<l Σi≠j (wij - wji) * zijkl
subject to:
Σk xik = 1 ∀ i (each team in exactly one tier)
zijkl ≤ xik ∀ i,j,k,l (linearization)
zijkl ≤ xjl ∀ i,j,k,l
zijkl ≥ xik + xjl - 1 ∀ i,j,k,l
xik ∈ {0,1}
zijkl ∈ {0,1}
Where xik = 1 if team i is assigned to tier k, and zijkl = 1 if team i is in tier k AND team j is in tier l.
Coupled Probability Updating (Heuristic)
For larger instances where the MILP becomes computationally expensive, we use a Coupled Probability Updating algorithm that iteratively refines soft cluster assignments using simulated annealing:
- Initialize uniform probability distributions over tiers for each team
- Compute contribution scores based on expected cut imbalance gains
- Update probabilities via softmax with temperature annealing
- Apply momentum to smooth updates and avoid oscillation
- Periodically extract integer solutions and track the best
Selecting the Number of Tiers (K)
A key question in any clustering problem is: how many clusters (tiers) should we use? The choice of K significantly impacts the results.
Too Few Tiers
If K is too small, we force dissimilar teams into the same tier, reducing within-tier balance and leaving cut imbalance "on the table."
Too Many Tiers
If K is too large, tiers become too granular. We may separate teams that are statistically indistinguishable, and tiers may be nearly empty.
In This NFL Example
For this demonstration, we use a simplified approach: we test several values of K (from 2 to 6), solve the clustering problem for each, and select the K that maximizes cumulative cut imbalance. This grid search approach is practical for quick analyses and works well when the range of reasonable K values is small.
For NFL seasons, we use K = 5 tiers for both offense and defense to stratify team performance.
In the Research Paper: The full paper presents a more rigorous approach that optimizes over all possible numbers of tiers simultaneously. Rather than fixing K and solving, we formulate an extended model where K itself becomes a decision variable. This allows the algorithm to automatically discover the optimal number of tiers without requiring a grid search, and provides theoretical guarantees about the solution quality.
Interpreting the Results
The resulting tiers should be interpreted as follows:
- Teams in the same tier are statistically similar—pairwise comparisons between them are relatively balanced
- Teams in different tiers show clear separation—higher tier teams consistently outperform lower tier teams across the stat comparisons
- Tier ordering is determined by dominance—Tier 1 teams dominate all other tiers, Tier 2 teams dominate Tiers 3+, etc.
Note: Offense and defense are tiered separately because a team's offensive strength may differ significantly from its defensive strength.
Season Tier Results
Teams clustered into tiers based on offensive and defensive performance across the season.
Offensive Tiers
Defensive Tiers
Tier Progression
How each team's tier has changed across seasons (2015–2025).
Optimal Tier Discovery
Rather than fixing K=5, this formulation sets K to an upper bound and lets the optimizer discover the natural number of tiers. Empty tiers are removed after solving.
How it works: By setting K to an upper bound (10), the MILP is free to leave tiers empty. The solver maximizes cumulative cut imbalance, and the number of non-empty tiers in the solution is the optimal K. This avoids the need for a grid search over K values.
Optimal K by Season
Number of tiers discovered by the optimizer for each season
Optimal Offensive Tiers
Optimal Defensive Tiers
Preference Matrices
Explore the raw pairwise preference data underlying the tier assignments.
Offensive Preference Matrix
Entry (i, j) = number of offensive stats where row team i outperforms column team j (out of 10)
Defensive Preference Matrix
Entry (i, j) = number of defensive stats where row team i outperforms column team j (out of 10)