There's a growing problem: media outlets increasingly cite prediction market prices as fact. A CNN chyron reading "markets give Ukraine ceasefire 18% odds" treats the number as authoritative. But from which platform? Measured when? And could someone with a few thousand dollars have moved it minutes before the screenshot? (Read: When Predictions Become News →)
Bellwether addresses this with two features: a robust composite price and a manipulation resistance score for every market we track. But we didn't start there.
Where We Started: Does Liquidity Predict Accuracy?
The natural first instinct was to use liquidity as a quality signal. A market with a 1-cent spread and $50,000 of depth feels more trustworthy than one with a 10-cent spread and $500 of depth. If that intuition held, you could just weight prices by liquidity and report the deepest markets with the most confidence.
We tested this using order book data from roughly 3,000 resolved markets across Polymarket and Kalshi. The answer is no. Markets with tighter spreads are not systematically more accurate than markets with wider spreads. Depth shows even less signal. We tested this multiple ways (cross-sectional, time series, conditional on price range) and the null held everywhere.
This has precedent. Tetlock (2008) found the same pattern in TradeSports: liquidity doesn't improve calibration and may actually worsen forecasting resolution. The mechanism is that naïve limit order traders subsidize informed flow without contributing information. More liquidity means more noise traders alongside informed ones, leaving the signal-to-noise ratio roughly constant. Clinton and Huang (2025) showed that PredictIt, with an $850 position cap, outperformed the far more liquid Polymarket in forecasting the 2024 election. Forecast quality depends on trader diversity, not capital depth.
So liquidity doesn't predict whether a price is right. But it does predict whether a price is resistant to being pushed. A thin market can still be accurate: the handful of informed traders who show up may price the event correctly. But that same thin market can be moved by anyone with a few thousand dollars. The problem isn't that thin markets are wrong. It's that you can't tell whether they're wrong or whether someone pushed them.
That distinction between accuracy and integrity is what led us to the two features below.
Why Not Just Use the Spot Price?
The simplest approach is to take the current price from Polymarket or Kalshi and report it. This has two problems.
First, the spot price is a single point in time. It reflects the last trade or the current midpoint, which can be moved temporarily by one large order. A manipulator buys aggressively, the price spikes, a journalist screenshots it, and the damage is done even if the price reverts minutes later.
Second, many political events are traded on both platforms, and the prices don't always agree. If Polymarket says 58% and Kalshi says 35%, which number goes on air? Picking one is arbitrary. Averaging them ignores that one might have ten times the trading activity of the other.
The Bellwether Price: Volume-Weighted Average Price
We compute a volume-weighted average price (VWAP) across all transactions on all platforms over a rolling window.
VWAP is not new. Duffie and Dworczak (2021) proved it is the optimal linear unbiased benchmark when agents have incentives to manipulate, using the same framework developed for LIBOR reform after the rate-fixing scandals. The intuition: every transaction is weighted by its dollar size, so small manipulative trades are diluted by the weight of normal trading activity.
How It Works
Take every trade that occurred in the window, regardless of which platform it happened on. Each trade has a price (between 0 and 1, representing a probability) and a size (number of contracts). The VWAP is:
That's it. If 40,000 contracts traded on Polymarket at prices around 58¢ and 5,000 contracts traded on Kalshi at prices around 35¢, the Polymarket activity dominates the average. Not because we chose Polymarket, but because more money moved there. The market with more informed trading activity naturally gets more weight.
Cross-Platform Pooling
For markets that exist on both Polymarket and Kalshi, we pool all transactions into a single VWAP. This is the core of the approach: the two platforms become one combined transaction stream. Both are binary contracts that pay $1 if the event occurs and $0 otherwise, so prices are directly comparable.
This does two things. It increases the number of transactions in the benchmark, which raises the cost of distortion. And it means manipulation on one platform is diluted by honest trading on the other. A manipulator would need to sustain large trades across both exchanges simultaneously to move the Bellwether Price.
For markets that exist on only one platform, the VWAP uses that platform's transactions alone. The card labels this explicitly: 6h VWAP across platforms vs. 6h VWAP · single platform.
The Fallback Cascade
Not every market trades constantly. A major US election market might see thousands of trades per hour. A Colombian presidential election market might see three trades in a day. The VWAP needs enough transactions to be meaningful, so we use an expanding window:
Tier 1: 6-hour VWAP. The default. If there are sufficient trades (10+) in the last 6 hours, use this window. It balances responsiveness to new information against manipulation resistance. Label: 6h VWAP.
Tier 2: 12-hour, then 24-hour VWAP. If fewer than 10 trades occurred in the last 6 hours, expand to 12 hours. Still insufficient? Expand to 24 hours. The wider window captures more transactions but is slower to reflect genuine price changes. A manipulation that happened 18 hours ago is still in the average. Label: 12h VWAP or 24h VWAP.
Tier 3: Stale. No trades in 24 hours. Display the last available VWAP with a timestamp. Label: Last VWAP (stale). The number is there for reference but it carries almost no informational weight.
The method label updates dynamically on every card. A viewer always sees how the number was computed, because a 31% from 6-hour VWAP across 50,000 trades and a 31% from a stale VWAP are completely different claims.
The Robustness Score: Cost to Manipulate
The VWAP protects the Bellwether Price from manipulation. The robustness score measures how vulnerable the platform prices feeding into it are.
For each platform trading a market, we simulate market orders of increasing size against the live order book, walking through resting limit orders to compute the dollar cost required to move the spot price by 5 cents. The market's overall robustness is the minimum cost across platforms, because a manipulator only needs to move the cheapest one to generate a misleading headline.
The robustness score is based purely on order book depth. It answers one question: "How much would it cost to push this price?" It does not factor in how recently the market traded. Staleness and manipulability are different risks. A market with a $500K-deep order book that hasn't traded in 8 hours isn't fragile—nobody can push it, the book is thick, but the last price discovery was a while ago. That's a different concern than a market that traded 30 seconds ago but costs $800 to move.
Reportability Labels
Each card displays a reportability label based on order book depth:
- Reportable (≥$100K to move 5¢): Would require institutional capital to distort.
- Caution ($10K–$100K): Movable by a motivated individual.
- Fragile (<$10K): Trivially manipulable.
The method label on each card (6h VWAP, 24h VWAP, Stale) shows the freshness of the price independently. A deep market on a 24-hour window is still Reportable if the book is thick. A thin market on a 6-hour window gets Fragile even though recent trades exist. The two dimensions are separate: depth determines the label, staleness is shown in the method.
Why Both Features Matter
The two defenses are complementary.
The VWAP absorbs manipulation after it happens. A burst of distorted trades gets diluted across hours of normal activity. The robustness score warns you before it happens, telling you how cheap it would be to try.
A market can have high robustness but a wide VWAP window, meaning the book is deep but nobody's traded recently. The price is hard to push but may be stale. Or a market can have a tight VWAP window but low robustness, meaning trades are flowing but the book is thin. Fresh price discovery, but vulnerable to a sudden shove. The reportability label and the method label together tell the full story.
References
Clinton, J. and Huang, J. (2025). Comparing prediction market forecasts of the 2024 US presidential election.
Duffie, D. and Dworczak, P. (2021). Robust benchmark design. Journal of Financial Economics, 142(2), 775–802.
Hall, A.B. (2025). When predictions become news.
Tetlock, P.C. (2008). Liquidity and prediction market informational efficiency. Working Paper.