On July 28, 2024, Shohei Ohtani stepped to the plate in the fifth inning against the Guardians and launched a two-run homer that pushed his season WAR past 6.0 before August. He wasn't pitching. He was doing it with his bat alone, as a designated hitter, something the stat's architects never quite imagined when they started building this number in the early 2000s. That single data point tells you everything about why wins above replacement became the most referenced, most argued-about, and most misunderstood metric in baseball.
But where did it actually come from? Who decided that one number could capture a player's total value? And why does everyone, from front offices to bar arguments, keep circling back to it?
I wanted to trace that story. So I did.
Where Did Wins Above Replacement Come From?
The idea behind wins above replacement didn't arrive fully formed. It was assembled, piece by piece, by people who were frustrated with how baseball evaluated talent.
The intellectual roots go back to Bill James and the sabermetric revolution of the 1980s. James didn't invent WAR, but he built the philosophical scaffolding: the belief that traditional stats like batting average, RBIs, and pitcher wins were noisy, context-dependent, and often rewarded the wrong players. His Baseball Abstracts asked a simple question that turned out to be radical. What if we measured what a player actually contributed to winning?
The concept of a "replacement level" player, the theoretical baseline that makes WAR work, crystallized in the late 1990s and early 2000s. Keith Woolner, writing for Baseball Prospectus, was among the first to formalize the idea. His argument was elegant: don't compare a player to the league average. Compare him to the freely available, minor-league-callup, waiver-wire guy who would take his roster spot if he disappeared.
That shift in baseline changed everything.
By the mid-2000s, two parallel versions of WAR emerged. FanGraphs developed fWAR, using FIP for pitchers and a different defensive framework. Baseball Reference built bWAR (also called rWAR), leaning on runs allowed and a different fielding methodology. The fact that two respected sources couldn't agree on a single formula tells you something important about the stat. It's not a measurement like speed or distance. It's a model. A very good model, but a model.
Then Moneyball happened.
The Moneyball Myth (and the Real Story)
Here's something that surprises people: Michael Lewis's Moneyball, published in 2003, barely mentions WAR. The book focused on OBP (on-base percentage) as the market inefficiency Billy Beane exploited. But Moneyball did something more important for WAR's future. It gave mainstream audiences permission to take baseball advanced stats seriously.

After the book (and especially after the 2011 Brad Pitt film), front offices that had been quietly using sabermetric tools could now do so openly. The question shifted from "should we use advanced metrics?" to "which advanced metric captures the most?" The WAR baseball stat became the answer, not because it was perfect, but because it was comprehensive. One number. Offense, defense, baserunning, positional adjustment, all rolled together.
By 2013, MLB Network was displaying WAR on screen during broadcasts. By 2018, it was showing up in MVP debates on ESPN.
The stat had gone from niche sabermetric tool to mainstream vocabulary in roughly a decade.
We track WAR across all 30 team digests every day at Small Ball, and the thing that strikes me after years of doing this isn't how useful the number is. It's how often it becomes the only number people look at. That's a problem the stat's creators never intended.
Get daily baseball updates, free
Scores, recaps, standings and Vibe Checks every morning.
So What Does Wins Above Replacement Actually Measure?
At its core, wins above replacement attempts to answer one question: how many additional wins did this player provide compared to a freely available replacement-level player?
A player with a WAR of 5.0 theoretically contributed five more wins to his team than a replacement-level player would have in the same playing time. A player with a WAR of 0.0 is, by definition, replacement level. A negative WAR means the player was actively worse than the guy you could grab off the waiver wire tomorrow.
That's the plain English version. The math underneath is more involved, but it's not impenetrable.
How Is WAR Calculated? Breaking Down the Formula
There's no single clean formula for WAR the way there is for, say, OPS or slugging percentage. It's a composite. But here's the general framework for position players (using fWAR):
WAR = (Batting Runs + Baserunning Runs + Fielding Runs + Positional Adjustment + League Adjustment) / Runs Per Win
Each component is measured in runs above average, then the whole thing gets converted into wins using a runs-per-win multiplier (typically around 10 runs = 1 win, though this fluctuates slightly by season).
Let's walk through a simplified example. Say a shortstop produces:
- +20 batting runs above average
- +3 baserunning runs on the bases
- +5 fielding runs by range and positioning
- +7 positional adjustment (shortstop gets a boost because it's a demanding position)
- +1 league adjustment
Total: 36 runs above average. Divide by ~10 runs per win. That's roughly 3.6 WAR before the replacement-level adjustment gets layered in (which adds about 20 runs over a full season to convert from "above average" to "above replacement"). Final WAR: approximately 5.6.
For pitchers, fWAR uses FIP (Fielding Independent Pitching) instead of ERA, stripping out the defense behind the pitcher. bWAR uses RA9 (runs allowed per nine innings), which keeps the defense in. This is the single biggest reason the two versions sometimes disagree wildly on a pitcher's value.
WAR Benchmarks Everyone Should Know
Here's the tier system that's become standard shorthand across the sport:
| WAR (Season) | Classification | 2024 Example |
|---|---|---|
| 8.0+ | MVP-caliber | Aaron Judge (10.6 bWAR) |
| 5.0 – 7.9 | All-Star | Bobby Witt Jr. (7.5 bWAR) |
| 3.0 – 4.9 | Above-average starter | Gunnar Henderson (5.1 bWAR) |
| 1.0 – 2.9 | Solid contributor | Various |
| 0.0 – 0.9 | Replacement-level to marginal | Bench players, spot starters |
| Below 0.0 | Below replacement | Players who probably shouldn't be on a roster |
A 2.0 WAR season is a perfectly useful player. Teams need those guys. The obsession with elite WAR totals can make fans undervalue the 2-win contributor who plays 150 games and doesn't embarrass himself on either side of the ball.
Also, pitcher WAR tends to be lower than position player WAR for the simple reason that pitchers don't bat (in most cases), don't run the bases, and only play every fifth day. A 4.0 WAR season for a starting pitcher is genuinely excellent.
The Part Nobody Agrees On: Defense and Replacement Level
Here's where I think the conversation around wins above replacement gets genuinely interesting, and genuinely messy.
The offensive component of WAR is solid. We can measure plate appearances, outcomes, park factors, and league context with high confidence. Baserunning is a little fuzzier but still grounded in observable events (stolen bases, extra bases taken, outs on the bases).
Defense is where the model starts to strain.
Defensive metrics, whether you're using UZR (Ultimate Zone Rating), DRS (Defensive Runs Saved), or OAA (Outs Above Average), require large sample sizes to stabilize. We're talking three full seasons before most defensive metrics become reliable. Yet WAR uses single-season defensive values to calculate a single-season number. A Gold Glove-caliber shortstop who has a fluky bad defensive year by the metrics could see his WAR drop by 2 full wins, even if scouts and coaches saw the same elite defender all year.
The replacement-level baseline is another source of tension. FanGraphs and Baseball Reference define replacement level slightly differently. The concept is the same (a freely available, AAAA-type player), but the calibration differs enough that you'll sometimes see a full win of disagreement between fWAR and bWAR for the same player in the same season.
This isn't a flaw, exactly. It's an honest reflection of the fact that WAR is a model with assumptions baked in, not a direct measurement.
But it means the difference between a 5.8 WAR season and a 6.3 WAR season is basically noise. Anyone who uses a tenth of a win to settle an MVP argument is flat-out misusing the tool.
What WAR Changed (and What It Didn't)
Before wins above replacement existed, player evaluation was a patchwork of counting stats, scouting reports, and vibes. A first baseman who hit .300 with 30 homers looked identical to a shortstop who hit .280 with 25 homers, even though the shortstop was almost certainly more valuable when you accounted for defensive position and context.
WAR gave the sport a common language for value. It allowed front offices to identify undervalued players, to see that a 4-WAR catcher was rarer and more valuable than a 4-WAR left fielder, to put a dollar figure on marginal wins. The entire structure of modern free agency, from the $/WAR calculations teams use to set contract offers, flows from this framework.
But WAR also flattened something.
It turned every player into a single number on a spreadsheet, and that number, by design, strips away context. When a player was hot. Whether the team was in a pennant race or playing out the string in September. How the clubhouse responded to a slump. Whether a guy's 3.5 WAR came in steady, reliable daily production or in two scorching months sandwiched around an injury.
For understanding how baseball statistics work in general, WAR is an essential piece of the puzzle. But it's not the whole picture, and the sport spent about a decade pretending it was.
The Thing the Number Can't Feel
Here's what I keep coming back to after years of writing these daily digests for Small Ball.
The 2023 Arizona Diamondbacks finished the regular season with a team WAR that didn't scream "World Series contender." They were good, not great, by the accumulation model. Then they caught fire in October, swept the Dodgers in the NLDS, and played in the Fall Classic.
The 2024 Kansas City Royals returned to the postseason after years of irrelevance, powered by a collective energy that showed up in the standings before it showed up in the WAR leaderboards.
Stats flatten time. A 4.0 WAR in April through September is the same number whether the team was surging or collapsing around that player. But anyone who watched those Diamondbacks in October, or those Royals in August, knows that momentum is real. Streaks are real. The way a team carries itself after a walk-off win versus after getting swept; that's real, and it compounds in ways that a cumulative seasonal stat can't register.
That's the gap Vibe Check was built to fill. Not to replace WAR. Not to argue that feelings matter more than data. But to capture the dimension of team performance that lives in the spaces between traditional metrics: momentum, streaks, collective energy, the difference between a team that's 45-40 and rising versus 45-40 and falling apart.
WAR changed how we value individual players. Vibe Check is our attempt to change how we feel teams, in real time, across a 162-game season that's too long and too weird for any single number to summarize.
We send a Vibe Check score for your team every morning, alongside the traditional stats that matter most. Pick your team, get the daily digest, and see what the numbers look like when momentum is part of the equation. Sign up for your free team newsletter here.
