The teams were unfair. Everyone could see it. One side had the three best players; the other side had everyone else. The match was decided before kickoff.
Fair team balancing is the hardest problem in recreational football. You have 14-20 players of varying abilities, and you need to split them into two teams that will produce a competitive match. Get it wrong, and people stop showing up.
I spent more time on team balancing than any other feature in Capo. Fifteen iterations with the AI before it worked correctly. It was the most challenging vibe coding task of the entire project.
Why It's Hard
The naive approach is to sort players by ability and alternate picks: best player to Team A, second-best to Team B, third-best to Team A, and so on. This produces reasonably balanced teams for even numbers.
But football teams aren't just collections of ability points. You need defenders, midfielders, attackers. A team of seven strikers will lose to a balanced team, regardless of individual ratings. Position balance matters as much as overall ability.
And the teams aren't always even. Thursday night football might have 15 players — teams of 8 vs 7. The smaller team needs to be slightly stronger to compensate for the numerical disadvantage.
Brute force doesn't scale. With 16 players, there are over 12,000 ways to split them into two teams of 8. With 18 players, over 48,000 combinations. With 20 players, over 184,000. Evaluating every possibility takes too long for a real-time application.
The Genetic Algorithm Approach
Genetic algorithms solve optimization problems by mimicking evolution. You start with a population of random solutions, evaluate their fitness, select the best ones, combine them to create new solutions, and repeat until you converge on a good answer.
For team balancing:
- Population: Generate 100 random team assignments
- Fitness: Score each assignment based on balance criteria
- Selection: Keep the best 20 assignments
- Crossover: Combine pairs of good assignments to create new ones
- Mutation: Randomly swap a few players between teams
- Repeat: Run for 50-100 generations
The algorithm explores the solution space efficiently, finding good team assignments without evaluating every possibility.
The Fitness Function
The fitness function determines what "balanced" means. Getting it right took most of the 15 iterations.
The first version only considered total ability:
function fitness(teamA: Player[], teamB: Player[]): number {
const abilityA = sum(teamA.map(p => p.rating));
const abilityB = sum(teamB.map(p => p.rating));
return -Math.abs(abilityA - abilityB); // Closer to zero is better
}This produced teams with equal total ratings but terrible position balance. All the defenders on one team, all the attackers on the other.
The second version added position weighting:
function fitness(teamA: Player[], teamB: Player[]): number {
const abilityDiff = Math.abs(totalRating(teamA) - totalRating(teamB));
const defenderDiff = Math.abs(countDefenders(teamA) - countDefenders(teamB));
const attackerDiff = Math.abs(countAttackers(teamA) - countAttackers(teamB));
return -(abilityDiff + defenderDiff * 10 + attackerDiff * 10);
}Better, but the weights were wrong. The algorithm would sacrifice significant ability balance to achieve perfect position balance.
The final version uses weighted components tuned through testing:
function fitness(teamA: Player[], teamB: Player[]): number {
const abilityScore = -Math.abs(totalRating(teamA) - totalRating(teamB)) * 1.0;
const positionScore = -positionImbalance(teamA, teamB) * 0.3;
const formScore = -Math.abs(recentForm(teamA) - recentForm(teamB)) * 0.5;
return abilityScore + positionScore + formScore;
}The Rating System
Player ratings needed to reflect actual performance, not just assigned ability levels. I implemented EWMA (Exponentially Weighted Moving Average) ratings based on match results.
// After each match, update player ratings
newRating = alpha * matchPerformance + (1 - alpha) * oldRating;The alpha parameter controls how quickly ratings respond to recent performance. The system uses a 2-year half-life — a player's rating reflects their performance over roughly the last two years, with recent matches weighted more heavily.
New players present a challenge. With no match history, their rating is uncertain. The algorithm uses Bayesian shrinkage: new players start at the league average and gradually move toward their true ability as data accumulates.
The Iterations
The genetic algorithm saga:
Attempt 1: "Build a genetic algorithm for team balancing"
Result: Code that didn't converge. The fitness function was wrong.
Attempt 5: "The teams are still unbalanced, the algorithm prefers one team"
Result: AI identified that crossover was biased toward Team A.
Attempt 10: "Now it's too slow — 30 seconds per balance"
Result: AI optimized population size and generation count.
Attempt 15: "Perfect balance but ignores position requirements"
Result: AI added position-weighted fitness components.
Each iteration revealed edge cases. What if all players have identical ratings? What if positions are unbalanced (10 attackers, 2 defenders)? What if the team sizes are uneven?
The AI's ability to handle these incrementally — each fix building on the last — was remarkable. By the end, the balancing code was 395 lines of evolved, battle-tested logic.
The Visualization
Users need to understand why teams were balanced a certain way. I added a TornadoChart visualization showing the balance factors:
Team A Team B
████████████████ Rating ████████████████
██████████ Form ██████████
████████ Defense ████████The chart shows that teams are balanced across multiple dimensions, not just total ability. Users can see that even if one team looks stronger on paper, the algorithm has accounted for recent form and position balance.
What the AI Learned
The genetic algorithm was the hardest vibe coding challenge because it required understanding why things weren't working, not just what to change.
When the algorithm didn't converge, the AI couldn't diagnose the problem from the code alone. I had to describe the symptoms ("teams are always unbalanced in the same direction"), and the AI would hypothesize causes ("crossover might be biased").
This back-and-forth — human observation, AI hypothesis, human testing, AI refinement — took multiple sessions across several days. The AI couldn't have designed this system from a single prompt. But it could iterate toward a working solution with human guidance.
The Broader Pattern
I talk more about the overall approach in How I Actually Vibe Code. The genetic algorithm illustrates both the power and limits of AI-assisted development.
The performance optimizations ensure the balancing results display instantly. The algorithm runs in milliseconds; React Query caches the results.
The booking system uses the balanced teams as input. Once teams are generated, players can book spots on their assigned team.
The Outcome
The balancing algorithm now handles:
- Even and uneven team sizes
- Position requirements (minimum defenders, etc.)
- Recent form weighting
- New player uncertainty
- Multiple balancing modes (by ability, by performance, random)
Matches are competitive. The 7-0 blowouts are rare. Players trust the system because they can see the balance visualization and understand the reasoning.
395 lines of code. 15 iterations. The hardest feature I built with AI — and the one that makes the biggest difference to the actual football.