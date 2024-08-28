Measurement of intelligence in humans has always been contentious and vigorously debated. Artificial intelligence (AI) has led to resurgence of interest in this area, especially since AI engines are now regarded as having passed the Turing Test.
The Turing Test holds that if a machine can engage in a conversation with a human without being detected as a machine, it has demonstrated human intelligence.
AI engines have also passed other difficult tests used to evaluate and accredit humans, such as the US medical licensing examination and the US uniform bar exam.
Large language models (LLMs) are evaluated against different question banks containing thousands of questions covering a wide range of areas. Having been trained on all written text and all digital information, the models are particularly good at language processing, translation and recall, already outperforming humans on these types of tasks.
However, maths, image interpretation and logical reasoning have been shown to be areas where humans still have an advantage. Quantitative ability can be evaluated using maths tests, maths competitions and things like the exceptionally difficult International Mathematics Olympiad (IMO) and the University of Cape Town (UCT) mathematics competition.
With the pace of development of AI we may not offer much competition in maths competitions or maths Olympiads from as early as 2025, the writer says. Picture: 123RF
SA has competed in the IMO since 1992, and its teams have received one gold medal, 11 silver medals, 55 bronze medals and 78 honourable mentions. The country placed mid-pack — 53 out of 107 nations — in 2024, a vast improvement from 1992 when no South African got more than seven points and we came 54th out of the 56 participating nations.
We earned our first honourable mentions in 1993, our first bronze medals in 1994, and Bruce Merry earned SA its one and only gold medal in 1997 and our first silver medals in 1998, managing to get four medals in the 2004 event. We have improved over time, but so has the competition.
As in the Summer Olympics, participants from China and the US have consistently topped the table. The Chinese have won more than 60% of the competitions they have entered, but in July the Chinese team was bumped off the top spot for the first time since 2019 by the team from the US. Photographs of the winning US team — comprising Wang, Wan, Tang, Pothapragada, Zhang and Lefkowitz — generated many memes.
In future, the IMO will not just be a competition between countries but also between mankind and machines, though the top competitors are likely to remain Chinese and American for a while yet.
The AI Mathematical Olympiad (AIMO) prize was created in 2023 to spur innovation. The sponsors, XTX, put up a prize pool of $10m, of which a grand prize of $5m will be awarded to the first publicly shared AI model to score a gold medal at the IMO (or similar). In the first year the AIMO was run Team Numina topped the leader board, correctly answering 29 out of 50 IMO-type questions.
With a little help and “non-exam like conditions”, Google Deepmind’s AlphaProof and AlphaGeometry correctly solved four of the six IMO 2024 questions, scoring 29 out of 42, which was better than SA’s top IMO competitor of 2024 (Ben Maree), who scored 22, impressively answering three of the six IMO problems correctly and earning a prestigious silver medal.
Maths olympiad problems are exceptionally hard, taking hours to answer and requiring years of preparation and training. Very few humans would earn a non-zero score. On the other hand, maths competition questions could be described as reasoning problems or logic puzzles, though many still rely on mathematical techniques and concepts to get to the right answer.
These multiple-choice problems are quick to answer, but they don’t come with a reference to the subject matter being employed. As with problems in everyday life, knowing what knowledge and reasoning skills to employ is at least as important as being proficient in their application.
The UCT Mathematics Competition is the oldest and most prestigious mathematics competition in SA and is used to identify the top mathematicians in the Western Cape who, as in 2024, regularly form the bulk of the SA IMO team. Maree, the SA IMO team’s best performer, was the top competitor in grades 8, 9 and 10 over the past three years of the UCT competition. (Observant readers may have noted that, yes, Ben still has two years of school remaining.)
When the grade 8 paper from the 2024 UCT maths competition was given to OpenAI’s ChatGPT-4o, the engine scored 143/180, which would have placed it 12th out of the 1,164 grade 8 mathematicians selected to represent their schools. It would have only just beaten Anthropic’s Claude. Google’s free version of Gemini did not perform as well, but it is only a matter of time before the DeepMind models underlying AlphaProof and AlphaGeometry find their way onto our devices through a software update to Gemini.
Humans are already no match for chess computers, and with the pace of development we may not offer much competition in maths competitions or maths Olympiads from as early as 2025. In the case of the UCT Mathematics Competition 2024 paper, generative AI models are already performing at a level only the top 1% of maths competition entrants can achieve, and since they are the cream of the crop this probably already compares to one-in-1,000 high school students.
Each year these studious models will have done more past problems and learnt from their mistakes, so they’ll perform even better.
• Becker, a retired actuary and recently qualified maths teacher, is founder of MyTutor.chat.
GREG BECKER: AI chatbots are already better at hard maths than you are
With the pace of development, by 2025 humans may already offer little competition in maths competitions
Measurement of intelligence in humans has always been contentious and vigorously debated. Artificial intelligence (AI) has led to resurgence of interest in this area, especially since AI engines are now regarded as having passed the Turing Test.
The Turing Test holds that if a machine can engage in a conversation with a human without being detected as a machine, it has demonstrated human intelligence.
AI engines have also passed other difficult tests used to evaluate and accredit humans, such as the US medical licensing examination and the US uniform bar exam.
Large language models (LLMs) are evaluated against different question banks containing thousands of questions covering a wide range of areas. Having been trained on all written text and all digital information, the models are particularly good at language processing, translation and recall, already outperforming humans on these types of tasks.
However, maths, image interpretation and logical reasoning have been shown to be areas where humans still have an advantage. Quantitative ability can be evaluated using maths tests, maths competitions and things like the exceptionally difficult International Mathematics Olympiad (IMO) and the University of Cape Town (UCT) mathematics competition.
SA has competed in the IMO since 1992, and its teams have received one gold medal, 11 silver medals, 55 bronze medals and 78 honourable mentions. The country placed mid-pack — 53 out of 107 nations — in 2024, a vast improvement from 1992 when no South African got more than seven points and we came 54th out of the 56 participating nations.
We earned our first honourable mentions in 1993, our first bronze medals in 1994, and Bruce Merry earned SA its one and only gold medal in 1997 and our first silver medals in 1998, managing to get four medals in the 2004 event. We have improved over time, but so has the competition.
As in the Summer Olympics, participants from China and the US have consistently topped the table. The Chinese have won more than 60% of the competitions they have entered, but in July the Chinese team was bumped off the top spot for the first time since 2019 by the team from the US. Photographs of the winning US team — comprising Wang, Wan, Tang, Pothapragada, Zhang and Lefkowitz — generated many memes.
In future, the IMO will not just be a competition between countries but also between mankind and machines, though the top competitors are likely to remain Chinese and American for a while yet.
The AI Mathematical Olympiad (AIMO) prize was created in 2023 to spur innovation. The sponsors, XTX, put up a prize pool of $10m, of which a grand prize of $5m will be awarded to the first publicly shared AI model to score a gold medal at the IMO (or similar). In the first year the AIMO was run Team Numina topped the leader board, correctly answering 29 out of 50 IMO-type questions.
With a little help and “non-exam like conditions”, Google Deepmind’s AlphaProof and AlphaGeometry correctly solved four of the six IMO 2024 questions, scoring 29 out of 42, which was better than SA’s top IMO competitor of 2024 (Ben Maree), who scored 22, impressively answering three of the six IMO problems correctly and earning a prestigious silver medal.
Maths olympiad problems are exceptionally hard, taking hours to answer and requiring years of preparation and training. Very few humans would earn a non-zero score. On the other hand, maths competition questions could be described as reasoning problems or logic puzzles, though many still rely on mathematical techniques and concepts to get to the right answer.
These multiple-choice problems are quick to answer, but they don’t come with a reference to the subject matter being employed. As with problems in everyday life, knowing what knowledge and reasoning skills to employ is at least as important as being proficient in their application.
The UCT Mathematics Competition is the oldest and most prestigious mathematics competition in SA and is used to identify the top mathematicians in the Western Cape who, as in 2024, regularly form the bulk of the SA IMO team. Maree, the SA IMO team’s best performer, was the top competitor in grades 8, 9 and 10 over the past three years of the UCT competition. (Observant readers may have noted that, yes, Ben still has two years of school remaining.)
When the grade 8 paper from the 2024 UCT maths competition was given to OpenAI’s ChatGPT-4o, the engine scored 143/180, which would have placed it 12th out of the 1,164 grade 8 mathematicians selected to represent their schools. It would have only just beaten Anthropic’s Claude. Google’s free version of Gemini did not perform as well, but it is only a matter of time before the DeepMind models underlying AlphaProof and AlphaGeometry find their way onto our devices through a software update to Gemini.
Humans are already no match for chess computers, and with the pace of development we may not offer much competition in maths competitions or maths Olympiads from as early as 2025. In the case of the UCT Mathematics Competition 2024 paper, generative AI models are already performing at a level only the top 1% of maths competition entrants can achieve, and since they are the cream of the crop this probably already compares to one-in-1,000 high school students.
Each year these studious models will have done more past problems and learnt from their mistakes, so they’ll perform even better.
• Becker, a retired actuary and recently qualified maths teacher, is founder of MyTutor.chat.
JOHAN STEYN: Unlocking growth: how AI can revolutionise Africa’s hospitality sector
OpenAI buys database analytics firm Rockset in nine-figure stock deal, sources say
KATE THOMPSON DAVY: What the doctor ordered: Apple debuts much-anticipated AI updates and OpenAI deal
NEWS FROM THE FUTURE: Say goodbye to Gen AI
OpenAI sets up safety committee as it starts training new model
Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.
Please read our Comment Policy before commenting.
Most Read
Related Articles
JOHAN STEYN: How AI is helping rugby and other sports
JOHAN STEYN: Symbolism and sensitivity — navigating AI in a culturally diverse ...
Music labels’ AI lawsuits create new copyright puzzle for US courts
Nvidia creating AI chip for Chinese market, sources say
NEWS FROM THE FUTURE: Battle of the bots
MVELO HLOPHE: Deciding whether AI code generation is friend or foe to developers
Published by Arena Holdings and distributed with the Financial Mail on the last Thursday of every month except December and January.