Explained: How The AI Libratus defeated Top Poker Pros

02/13/2017

After a 20 day marathon challenge at the Rivers Casino in Pittsburgh, Pennsylvania, the results of the Brains Vs. Artificial Intelligence: Upping the Ante shocked the whole poker community when the Artificial Intelligence (AI) “Libratus”beat 4 of the best poker players in the world heads-up in No Limit Texas Holdem.

It is already known to the world that AI has beaten other mind games like chess and Go. But in these games, the whole board can be seen and analyzed. Due to incomplete information, No Limit Hold’em is supposed to be one of the hardest games for an AI to beat. Unlike the other games, Hole cards are hidden from opponents and intrinsic to the game of No Limit Hold’em is a level of randomness and bluffing (done normally in the form of bet sizes) to keep the opponent from truly finding out what those hole cards are.

AlphaGo, the AI developed by Alphabet Inc.’s Google DeepMind, beat a professional Go player last year. But prior to the human challenge, AlphaGo had to study 30 million Go moves made by human players. Then, it play against itself to hone its skills before engaging in the human challenge.

In the duration ofBrains Vs. Artificial Intelligence: Upping the Ante, computer science researchers from Carnegie Mellon University, Tuomas Sandholmand Noam Brownrefused to divulge how Libratus could possibly beat the system and the best human players at a very unpredictable game. But apparently, the secret to Libratus’ success lay on the fact that it wasn’t just 1 AI the humans were up against but 3 artificial intelligences.

1: Self-Taught Artificial Intelligence

Unlike DeepBlue, the chess AI, and AlphaGo, both of which studied human game moves, Libratus learned poker from scratch.

“We didn’t tell Libratus how to play poker. We gave it the rules of poker and said ‘learn on your own’,” said Brown.

And with that, the AI started playing poker and learning on its own, creating its own strategy, and making its own moves. After trillion of hands, it had developed a winning strategy in a game that would take humans a longer time to master.

AlphaGo uses an artificial/deep neural learningwherein the AI goes through extensive training and self-learning. This method tries to simulate how a biological brain will solve problems in a logical and scientific manner.

In contrast, Libratus uses reinforcement learning,an algorithmic learning founded on behavioral sciences. The algorithm is made so as to allow a balance between exploration of unknown data and an exploitation of knowledge acquired.

2: Endgame Solver

The second AI used was that of an endgame solver. Endgame Solving is a standard algorithm used in games that have perfect information like chess and checkers. With perfect information games, the AI could use a specific strategy on a per move basis with the knowledge of how each move will end the game. But when used in a game like poker where information is incomplete and the probabilities are huge, endgame solving algorithms become more challenged wherein, to be able to achieve the best strategy, the AI would have to analyze some moves in comparison to the whole game as it’s going on. The 2^ndAI ensured that Libratus was constantly focused, on top of the game at hand and continuously “refines its strategy” as it plays against the humans.

3: Continual Improvement Meta-algorithm

Despite the strength of Libratus, the team of humans, which consisted of Dong Kim, Jimmy Chou, Daniel McAulay and Jason Les, still found patterns in Libratus’ game strategy and exploited it. In so doing, there was a point that the humans were catching up and closing the gap between them. But this small victory didn’t last long. Brown and Sandholm built another algorithm wherein the AI will remove any patterns, patch up any holes and strengthen any weaknesses seen by the opponent.

“After play ended each day, a meta-algorithm analyzed what holes the pros had identified and exploited in Libratus’ strategy. It then prioritized the holes and algorithmically patched the top three using the supercomputer each night, “said Sandholm.

Libratus victorious

With all this in place, Libratus was able to learn how to manipulate bet sizes, which allowed him to put enough pressure on his human opponents, bluff them and counter any pressure they may offer.

Jason Lessaid, “We tried 3betting 80% of hands to ~2.6x Libratus’ open going into Day 6 or so. We did that because we saw that size was working very well in previous days. However, by the time we showed up to hammer that with all our might, it had learned the size and knew how to handle it.”

Frank Pfenning, head of the computer science department at CMU, said, “The computer can’t win at poker if it can’t bluff. Developing an AI that can do that successfully is a tremendous step forward scientifically and has numerous applications.”

And it is exactly this that has the community all concerned. After Libratus’ victory, the community now fears that online poker will be dominated by bots who will outplay any human player. In response, Eric Hollreiserfrom PokerStars said, “While on a functional hand-by-hand basis it mimics poker play, it is far, far removed from the reality of what happens at tables.”

And with that, the idea of a multi-player game being far more difficult to calculate as compared to a heads-up match may just give sufficient hope that humans may still be better at Texas Holdem than Artificial Intelligence.

Article by Gabrielle Barredo

Tags:

News