subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now
Picture: 123RF/SEMISATCH
Picture: 123RF/SEMISATCH

It has been both enthralling and troubling to see and experience the step change propelled by the recent release of large language models, specifically OpenAI’s ChatGPT3.

Humanity’s endeavours to build more complex neural networks to help solve our increasingly complex problems are paying off. While we are clearly only at the very start, an enormous cache of future use cases is poised for discovery, from which we will unlock exponential utility in both business and general life.

We’re also at an uncomfortable point in how human and artificial intelligence press ahead. A simple internet search can now provide us with huge amounts of processed data in an instant. But without a healthy dose of circumspect scrutiny we’re at risk of being led by the nose into a barren no-man’s land between truth and false oracles.

As the name suggests, a large language model requires a huge corpus of data, mostly gleaned from the internet and in digitised literature, to produce results. These are best described as the most likely or average version of the data it was trained on.

The most fundamental unit of these models is a non-linear function that predicts an output based on a normalised input. Each of these functions exists in layers that pass their outputs to the next layer of abstraction, where they become the inputs to those layers in iterations. This roughly corresponds with how we understand the workings of cortical columns in the neocortex of the human brain. 

But right now, on final output, there is no factual check for accuracy in AI models. Rather, the only check is self-referencing: whether there is conformity to the model as trained on the corpus. Although several different training mechanisms are applied, in general there is a combination of automated and human-supervised methods. It is obvious that the automated processes dwarf the human-supervised ones and that this trend will continue.

However, to determine factual accuracy a human is needed, or fact-checking algorithms need to be deployed where a set of factually verified data is used to cross reference and to correct or retrain the models. In our excitement to embrace AI advances, the reliance on possible “false oracles” means we will struggle to know what biases these models have and, most importantly, which ones we can trust.

There are three reasons why that is worrying:

  • There is very little visibility of the training data that is used in the models. Visibility goes beyond just offering access to the source.
  • We are already overwhelmed with more information than we can handle. As the rate of data production and availability grows exponentially, we are going to become more and more reliant on data aggregation tools that can summarise huge volumes of data to help us form rational conclusions. Daniel Kahneman writes extensively about the risks of binary and pithy generalisations of data that lead to shortcuts to conclusions, over deep thought. With data at scale we need simplifications, but they need to be trustworthy and without bias and agenda.
  • Soon the ability to build these models will be common and they will be indistinguishable at face value. The distinction between what one model versus another can produce as an output depend on two factors: the data on which they are trained, and the intervention or censorship layers built on top of the prediction. Chat GPT3 clearly already has this, but verification will become increasingly more complex and impossible for even the most technologically sophisticated human to interrogate. We’re already at risk of being lulled into a false sense of comfort because of how responses are carefully curated and disclaimed; this is done not to enhance truth and accuracy but to alter and corrupt a result to safeguard the model’s reputational risk.

It’s not been long since the Cambridge Analytica scandal, and it’s clear that the risks introduced by large language models to sway opinions are real and larger than ever. The profit incentive is also plain to see, and OpenAI hasn’t done much to disguise just how fast this motive can overwrite the founding ethos of an organisation.

What began as a mission to keep AI transparent and open source, as its power was too large to keep secret, has transformed into a closed system, sold to the first highest bidder. The unprecedented adaptation of ChatGPT3 highlights how readily we will accept large language models into our lives as a way to shortcut the work of searching through multiple results to find enough data for us to build a conclusion. The prebuilt conclusion that this AI offers is so much easier.

But insatiable corporate greed and politically acceptable policies are converging at a fertile point where replies contorted agendas misaligned to truth can be smuggled in. This does not bode well for growth in the large language model space, though it is entirely predictable. Even Elon Musk, one of the founders and early proponents of OpenAI, has publicly distanced himself from what it has become.

What remains certain, though, is that these models are going to change the very nature of how we build our views of what is true in the world. The threats to true knowledge posed by false oracles are increasing. Without some self-regulation at best, and government-imposed regulation at worst, AI is poised to herald a new era of subtle, sophisticated misinformation. We need those who care about truth and freedom of speech to safeguard the future of truth, before we no longer know what is real.

• Matthis is lead partner and founder of the AI Lab at IQbusiness.

subscribe Support our award-winning journalism. The Premium package (digital only) is R30 for the first month and thereafter you pay R129 p/m now ad-free for all subscribers.
Subscribe now

Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.

Speech Bubbles

Please read our Comment Policy before commenting.