AI and Poker History — Libratus, Pluribus, CFR Algorithm

🎯 Why Poker Was the Ultimate AI Challenge

Chess and Go were solved first because they are perfect information games — both players see the entire board. Poker is fundamentally different: it is an imperfect information game where players hide their hole cards.

This makes poker far harder for AI in several ways:

Hidden state: You cannot observe the opponent's cards
Deception: Bluffing and mixing strategies are optimal, not just calculation
Multi-street decisions: Actions on earlier streets affect later ones
Massive game tree: Heads-up NLHE has roughly $10^{160}$ decision nodes

Perfect vs Imperfect Information

In chess, Deep Blue could calculate exactly what moves were possible. In poker, the best strategy involves intentionally randomizing your actions — bluffing sometimes, value-betting other times — to prevent opponents from exploiting you. This randomization requirement is what makes poker so difficult for deterministic algorithms.

📅 Timeline: AI vs Poker

1997

Loki and PsOpti — Early Rule-Based Systems

University of Alberta's Computer Poker Research Group builds early bots using hand-crafted rules and simple opponent modeling. These systems can beat beginners but are easily exploited by experienced players.

2003

Vexbot and First GTO Approximations

Researchers begin applying game-theoretic equilibrium concepts to poker. Early programs approximate Nash equilibrium for simplified games (limit poker, fewer streets).

2007

CFR Algorithm Invented

Martin Zinkevich and colleagues at Alberta publish Counterfactual Regret Minimization (CFR). This algorithm changes everything — it can solve large imperfect-information games by iteratively reducing regret on each action. CFR becomes the foundation of all subsequent poker AI.

2015

Claudico — First Serious Challenge to Pros

Carnegie Mellon's Claudico plays 80,000 hands of NLHE against four top professionals. The pros win by a statistically marginal amount, but Claudico demonstrates superhuman strategies: large overbets, unusual sizing. Human pros are surprised by its play.

2017

Libratus — AI Defeats Human Pros

Libratus (CMU) beats four of the world's best heads-up NLHE specialists by +14.7 big blinds per 100 hands over 120,000 hands — a statistically definitive margin. Libratus uses a three-part system: blueprint strategy, sub-game solving, and self-improvement during the match. It is considered the first AI to defeat top pros in this format.

2019

Pluribus — First AI to Beat 6-Player Poker

Facebook AI Research and CMU release Pluribus. It defeats top professionals in 6-player NLHE — previously considered far harder than heads-up because of the multi-player dynamics. Pluribus achieves this with far less computing power than previous systems, using Monte Carlo CFR (MCCFR) and a limited search depth.

2020+

Solver Era — GTO Tools Reach Players

Solvers based on CFR (PioSOLVER, GTO+, Simple Postflop) become commercially available. Professional and serious recreational players now study AI-derived strategies. GTO play and solver outputs become standard in high-stakes poker preparation.

🤖 How CFR Works — The Core Algorithm

Counterfactual Regret Minimization (CFR) is the breakthrough algorithm that powers all modern poker AI. Here is the intuition:

The Regret Concept

At any decision point, "regret" measures how much better you would have done if you had chosen a different action. CFR iteratively adjusts your strategy to reduce regret on every action.

$$R^T(a) = \sum_{t=1}^{T} \left( v(a, \sigma^t_{-i}) - v(\sigma^t) \right)$$

Where $R^T(a)$ is the cumulative regret for action $a$ after $T$ iterations, $v(a, \sigma^t_{-i})$ is the value of always taking action $a$ against opponent's strategy, and $v(\sigma^t)$ is the value of the current strategy.

The Update Rule

At each iteration, actions with positive regret are played more often, proportional to their regret:

$$\sigma^{T+1}(a) = \frac{\max(R^T(a), 0)}{\sum_{a'} \max(R^T(a'), 0)}$$

After many iterations, the average strategy converges to a Nash equilibrium — a strategy where no player can improve by unilaterally changing their action.

Why It's Powerful

Does not require knowing the opponent's strategy in advance
Self-plays against itself — no human data needed
Converges to Nash equilibrium (GTO) with enough iterations
Can handle the massive game trees of real poker through abstraction

CFR in Practice

Solvers like PioSOLVER run CFR for millions of iterations on a single hand scenario. The resulting strategy prescribes exact mixing frequencies — e.g., "bet pot 40% of the time, check 60%" — which form the basis of GTO study.

🏆 Inside Libratus: Three-Module Architecture

Libratus used a novel three-part system that made it far more effective than naive CFR:

Module 1: Blueprint Strategy

Before the competition, Libratus computes an approximate Nash equilibrium strategy for the full game using a coarse abstraction of the bet-sizing space. This blueprint is pre-computed and runs for 15 million CPU core-hours.

Module 2: End-Game Sub-Game Solving

During play, whenever the turn or river is reached, Libratus re-solves the exact sub-game using the actual cards — not an abstract version. This fine-grained solving corrects errors from the coarse blueprint.

Module 3: Self-Improvement (Blueprint Patching)

Every night during the competition, Libratus analyzes the situations where it diverged from blueprint and improves those specific nodes overnight. It literally patches its own weaknesses in real time during the match.

The result: Libratus won by +14.7 bb/100 over 120,000 hands — equivalent to more than $800,000 in a cash game at those stakes.

💡 Impact on Modern Poker Strategy

What AI Taught Humans

Studying AI outputs fundamentally changed how top players think about poker:

Overbetting: Humans rarely used bets larger than the pot. AI demonstrated that overbets (1.5x–3x pot) are often strategically correct and part of a balanced range.
Donk betting: Previously considered "bad" by many pros. AI shows donk bets are correct in specific board/range situations.
Checking strong hands: AI checks very strong hands more often than humans do, balancing checking ranges to prevent exploitation.
Range-based thinking: AI never thinks about individual hands in isolation — only ranges and frequencies. This mindset has permeated elite human play.
Bet-sizing variety: AI uses many more sizing options than humans were accustomed to, forcing defenders to handle more complex scenarios.

The Solver Revolution

The proliferation of CFR-based solvers transformed poker study. High-stakes players now spend hours reviewing solver outputs and internalizing GTO frequencies. The gap between solver-studied players and intuition-only players has widened significantly at stakes above $1/$2.

For Your Game

You don't need to master solver outputs to benefit from AI research. The key insights — balanced ranges, mixed strategies, bet-sizing awareness — can be applied as strategic principles even without running a solver.

AI and Poker: A Complete History