Last month, a new AI bot called Pluribus, developed jointly by Facebook and Carnegie Mellon University, beat top human poker players at their own game for the first time ever. Carmel Kent and Rose Luckin, Research Mentor and Director, respectively, of UCL EDUCATE, ask: ‘have humans finally been defeated by AI?’
We have seen AI bots defeat humans in the past, in games of chess, Go and even in poker. But Pluribus’ victory in a six-player, no-limit Texas Hold’em game signals a whole new development: not only was it the first victory in such a complex multi-player game, requiring the use of hidden information and bluffing, but it also required very little computing power.
Most AI gaming advancement until now were in two-player games – i.e. zero-sum games, where one wins and the other loses – but real life is much more complicated than that.
Unlike the strategies used by the developers of Deep Blue and Alpha Go, an AI poker bot cannot rely on heavy calculations to provide a complete set of information on each point. It would need to use many strategies, such as bluff and uncovering others’ bluffs to navigate and win the game.
What the makers of Pluribus have done more efficiently and cheaply than some of its predecessors is to evaluate its options only a few moves ahead at a time, rather than search its moves exhaustively to the end of the game. This has made it more adaptable, more efficient and better applied to real-life situations than, for example, Alpha Go or Deep Blue.
Darren Elias, four-time poker world champion, speculated: “The first time the AI wins is the last time the human will ever win”, adding, “I’ve done nothing but play poker since I was 16 years old and dedicated my life to it, so it’s very humbling to be beaten by a machine.”
Are human gamers to be the first to be professionally replaced by AI – and if so, what does this mean for the rest of us?
The machine’s unfair advantage
It is crucial to remember that, until now, AI has only been able to surpass human intelligence in an untransferable, narrow context of very structured games. And despite the pace of technological development, the fundamental questions around AI’s abilities have not changed much in 50 years, when Marvin Lee Minsky, an American AI scientist predicted in 1970: “In from three to eight years we will have a machine with the general intelligence of an average human being”.
His forecast, as we know, failed to materialise, and since then we have gained a greater understanding of AI’s potential. For us to be better equipped to develop and flourish in a world of AI’s quick wins, we must gain a deeper understanding of the difference between human and artificial intelligence.
We must base our decision on what we know to be the unfair advantage of being human, a skill that is unique to us and how this compares with AI’s unfair advantage.
AI technologies such as Go, image classification, speech recognition, handwriting transcription and digital assistants are all challenges tailored to the unfair advantage of AI over us by involving effective search, pattern recognition, automating repetitive tasks and probabilities manipulations.
In contrast, our human unfair advantages include meta learning, multi and interdisciplinary academic intelligence, social and meta-cognitive intelligences and perceived self-efficacy.
So, when you see champion gamers like Darren Elias beaten by Pluribus or Lee Sedol beaten by Alpha, what you are really watching is us losing in an unfair game – a human playing against a machine, which has an unfair advantage.
One example of such unfair advantage that Pluribus is using is the randomisation of bluffing. We humans clearly know how to bluff, but the timing of which we choose to bluff can become predictable, and our strategy easy to uncover. A machine, on the other hand can bluff randomly, and can therefore become the perfect ‘poker face’.
So, are the machines now catching up with our ability to learn?
It seems so. Pluribus uses a mechanism called self-play, which means the bot plays against copies of itself, without any human intervention or the input of trained data. Put simply, its algorithm itself reflects on its past moves, unlike other supervised machine-learning algorithms which use human experts to label observations as having good or bad outcomes.
This enables Pluribus to avoid repeating actions which it ‘regretted’ doing in the past – a perfectly simple self-regulated learning mechanism.
The balance of power is shifting.