a16z: What is the key to predicting explosive market growth?

Author：BlockBeats

原文标题：How AI judges can scale prediction markets: The case for locking LLMs into the blockchain to resolve the hardest contracts

Original author: Andrew Hall, a16z

Original translation by: Jia Huan, ChainCatcher

Last year, the market for predicting the Venezuelan presidential election results saw trading exceeding $6 million. But when the vote count closed, the market faced an impossible situation: the government declared Nicolás Maduro the winner, while the opposition and international observers accused the government of fraud. Should the prediction market's decision follow "official information" (Maduro's victory) or "the consensus of credible reporting" (the opposition's victory)?

In the Venezuelan election case, the observers' accusations escalated: they first condemned the disregard for rules and the "theft" of user funds, then denounced the decision-making mechanism for wielding absolute power in this political game—acting as "judge, jury, and executioner" all at once—and even stated that it had been severely manipulated.

This is not an isolated incident. It's one of the biggest bottlenecks I believe prediction markets face in scaling: contract adjudication.

The risks here are extremely high. If the rulings are handled well, people will trust your market and be willing to trade in it, and prices will become a socially meaningful signal. If the rulings are handled poorly, trading will become frustrating and unpredictable. Participants may leave, liquidity may dry up, and prices will no longer reflect accurate predictions of stability goals. Instead, prices will begin to reflect a vague mixture—containing both the actual probability of the outcome and traders' beliefs about how the distorted decision-making mechanism will adjudicate.

Venezuela's controversies are relatively high-profile, but more subtle failures often occur across various platforms:

• Ukraine map manipulation caseThis demonstrates how an opponent can directly profit through game-theoretic decision-making mechanisms. A contract regarding territorial control stipulates that decisions will be made based on a specific online map. It is alleged that someone modified this map to influence the contract's outcome. When your "source of truth" can be manipulated, your market can also be manipulated.

Government shutdown contractThis demonstrates how the source of the resolution led to inaccurate or at least unpredictable results. The resolution rules stipulated that the market would pay out based on the shutdown end date displayed on the Office of Personnel Management (OPM) website. President Trump signed the appropriations bill on November 12th—but for unknown reasons, the OPM website wasn't updated until November 13th. Traders who correctly predicted the shutdown would end on the 12th lost their bets due to the website administrator's delay.

Zelensky's suit marketThis sparked concerns about a conflict of interest. The contract asked Ukrainian President Zelensky whether he would wear a suit to certain events—a seemingly trivial question that attracted over $200 million in bets. When Zelensky attended the NATO summit dressed in what the BBC, the New York Post, and other media outlets described as a suit, the market initially voted "yes." But...UMAToken holders objected to the result, and the resolution was subsequently reversed to "no".

In this article, I will explore how to cleverly combine Large Language Models (LLMs) and cryptography to help us create a large-scale decision-making method for prediction markets that is difficult to manipulate, accurate, completely transparent, and trustworthy and neutral.

The problem is not only in predicting the market

Similar issues have long plagued financial markets. For years, the International Swaps and Derivatives Association (ISDA) has been grappling with the adjudication challenges in the credit default swap (CDS) market—a contract that pays out in the event of a company's or country's debt default. Its 2024 review report was remarkably candid about these difficulties. Their decision-making committee, composed of major market participants, votes on whether a credit event has occurred. However, this process has been criticized for its lack of transparency, potential conflicts of interest, and inconsistent results, much like the UMA's process.

The fundamental problem remains the same: when huge sums of money depend on the judgment of ambiguous situations, every decision-making mechanism becomes a target of the game, and every point of ambiguity becomes a potential flashpoint.

The four pillars of an ideal adjudication scheme

Any viable solution needs to achieve several key properties simultaneously:

1.Anti-manipulation

If adversaries can influence rulings, such as by editing Wikipedia, inserting fake news, bribing oracles, or exploiting loopholes, the market becomes a game of manipulation rather than prediction.

2.Reasonable accuracy

The mechanism must make the right decisions most of the time. Perfect accuracy is impossible in a world full of real ambiguity, but systematic errors or obvious mistakes can destroy credibility.

3.Prior transparency

Traders need to understand exactly how the decision will be made before placing a bet. Changing the rules midway violates the fundamental contract between the platform and the participants.

4.Credible neutrality

Participants need to trust that the mechanism is impartial to any particular trader or outcome. This is why it is so problematic to allow people holding large amounts of UMA to decide on the contracts they have bet on: even if they act impartially, the appearance of a conflict of interest undermines trust.

Human review panels can satisfy some of these properties, but they struggle to achieve others at scale—especially resistance to manipulation and credible neutrality. Token-based voting systems like UMA also have their own issues regarding the thoroughness of their documentation regarding whale dominance and conflicts of interest.

This is where AI takes this as its starting point.

Reasons for supporting the LLM judge

This is a proposal that has gained attention within the prediction market community:A large language model is used as the decision-making judge, and specific models and prompt words are locked on the blockchain when the contract is created.

The basic architecture is as follows: When a contract is created, the market maker must specify the decision criteria not only in natural language, but also the exact LLM (identified by the timestamped model version) and the exact prompt used to determine the result.

The specifications are encrypted and submitted to the blockchain. When a transaction is initiated, participants can examine the complete decision-making mechanism, knowing exactly which AI model will adjudicate the outcome, what prompts it will receive, and which information sources it will have access to.

If they don't like the setup, they won't trade.

At the adjudication time, the submitted LLM runs using the submitted prompt words, accesses the specified information source, and generates a judgment. The output determines who receives compensation.

This method simultaneously addresses several key constraints:

• Extremely resistant to manipulation (though not absolute)Unlike Wikipedia pages or small news websites, you cannot easily edit the output of a mainstream LLM. The model's weights are fixed at the time of submission. To manipulate the decision, an adversary would need to corrupt the information sources the model depends on, or poison the model's training data in some way long ago. Compared to bribing the oracle or editing the map, these two attack methods are costly and highly uncertain.

• Provide accuracyWith the rapid improvement of reasoning models and their ability to handle an impressive range of intellectual tasks, particularly when they can browse the web and seek new information, LLM judges should be able to accurately adjudicate many markets—experiments to understand their accuracy are underway.

• Built-in transparencyThe entire decision-making process is visible and auditable before anyone places a bet. There are no mid-process rule changes, no discretionary judgments, and no behind-the-scenes negotiations. You know exactly what you are signing.

• Significantly improves credible neutralityLLM has no economic interest in the outcome. It cannot be bribed. It does not own UMA tokens. Its biases, whatever they may be, are properties of the model itself—not properties of ad hoc decisions made by stakeholders.

Limitations and Defense Measures of AI

Models can make mistakes.LLM models might misinterpret news articles, create illusions of fact, or inconsistently apply decision criteria. However, as long as traders know which model they are betting on, they can factor these flaws into the price. Experienced traders will take into account if a particular model has a known tendency to resolve ambiguous cases in a specific way. A model doesn't need to be perfect; it needs to be predictable.

It is not impossible to manipulate.If the prompt specifies a particular news source, adversaries might attempt to insert stories into those sources. Such attacks are expensive for mainstream media but can be feasible for smaller outlets—another form of the map-editing problem. The design of the prompt is crucial here: a resolution mechanism relying on diverse, redundant sources is more robust than one relying on a single point of failure.

• Poisoning attacks are theoretically possible.A well-resourced adversary might attempt to influence LLM's training data to bias its future predictions. However, this requires action well before the contract is signed, with uncertain returns and enormous costs—far exceeding the threshold of bribing committee members.

The proliferation of LLM judges can create coordination problems.Liquidity becomes fragmented if different market creators use different cue words to work on different LLMs. Traders cannot easily compare contracts or aggregate information across markets. Standardization is valuable—but it's also valuable to let the market discover which combination of LLM and cue words works best. The right answer might be a combination that allows experimentation, but establishes mechanisms for the community to converge to well-tested default settings over time.

Four suggestions for builders

In summary, AI-based adjudication essentially swaps one set of problems (human bias, conflict of interest, lack of transparency) for another set (model limitations, cue word engineering challenges, information source vulnerabilities), with the latter set potentially being easier to handle. So how do we move forward? The platform should:

1. Experiment:

Test LLM resolutions on lower-risk contracts to build a track record. Which models perform best? Which cue word structures are most robust? What failure modes might occur in practice?

2. Standardization:

As best practices emerge, the community should work towards establishing standardized LLM and cue word combinations as the default. This doesn't preclude innovation, but it helps concentrate liquidity in well-understood markets.

3. Build transparent tools:

For example, design an interface that allows traders to easily review the complete decision-making mechanism—model, prompts, and information sources—before trading. Decision-making rules should not be buried in the details.

4. Conduct continuous governance:

Even with AI judges, humans still need to be responsible for setting the top-level rules: which models to trust, how to handle situations where models give obviously wrong answers, and when to update default settings. The goal is not to completely remove humans from the process, but to shift humans from ad-hoc, case-by-case judgments to systemic rule-making.

Prediction markets possess extraordinary potential to help us understand a noisy, complex world. But this potential hinges on trust, and trust depends on fair contract decisions. We have already seen the consequences of failed decision-making mechanisms: confusion, anger, and traders leaving the market. I have witnessed people angrily abandon prediction markets altogether—vowing never to use their former favorite platforms again—after feeling cheated by an outcome that seemed to contradict the spirit of their bets. This represents a missed opportunity to unlock the benefits and broader applications of prediction markets.

LLM judges are not perfect. But when combined with cryptography, they are transparent, neutral, and resistant to manipulation that has long plagued human systems. In a world where prediction markets are scaling faster than our governance mechanisms, this may be exactly what we need.

Original link

Open App for Full Article

USDUnited States Dollar

CNYChinese Yuan

JPYJapanese Yen

HKDHong Kong Dollar

THBThai Baht

GBPBritish Pound

EUREuro

AUDAustralian Dollar

TWDNew Taiwan Dollar

KRWSouth Korean Won

PHPPhilippine Peso

AEDUAE Dirham

CADCanadian Dollar

MYRMalaysian Ringgit

MOPMacanese Pataca

NZDNew Zealand Dollar

CHFSwiss Franc

CZKCzech Koruna

DKKDanish Krone

IDRIndonesian Rupiah

LKRSri Lankan Rupee

NOKNorwegian Krone

QARQatari Riyal

RUBRussian Ruble

SGDSingapore Dollar

SEKSwedish Krona

VNDVietnamese Dong

ZARSouth African Rand

You may like

The problem is not only in predicting the market

The four pillars of an ideal adjudication scheme

Reasons for supporting the LLM judge

Limitations and Defense Measures of AI

Four suggestions for builders