Generative AI and Large Language Models (LLMs) have hit the mainstream and caused everyone to take notice - as is often the case when technology moves out of the hands of the niche and into the public zeitgeist there have been all manner of opinions, facts and utter BS thrown around like sugar in a chocolate factory.

Humans are spectacular at pointing out the problems with things, as long as those things aren’t themselves, and there has been some amazing displays of this front and centre during the explosion of LLM conversations. But are the algorithms showing us any traits we hadn’t already perfected ourselves? Let’s have a look at some of the less positive arguments:

They have horrible bias / they are racists / sexist etc

Whilst there have been effective advances to curb the propensity for LLMs to spout some pretty egregious and distasteful stuff there is still the possibility for unfiltered and unmoderated content to slip through the gaps, and when it does the general public have been all too quick to ’tut tut’ these naughty robots and their worldviews.

Of course, humans have a fairly healthy history of genocide in our back pocket to reference, so we should be a pretty good judge. As far as history is recorded right through to the present-day humans have shown themselves capable of committing atrocities the scale of which boggles the mind.

The oft cited example of Amazon’s ML driven resume scanner back in 2014 made global headlines for all the wrong reasons as it heavily preferences male candidates over female candidates. Thank goodness that we turned off such a horrible algorithm and fixed the world before the machines could do more damage. Meanwhile as of 2022-23 data in Australia we have:

  • Women make up 37% of enrolments in university STEM courses, and just 17% of VET STEM enrolments.
  • Only 15% of STEM-qualified jobs are held by women.
  • In 2022, the gap between women’s and men’s pay in STEM industries was $27,012, or 17%. This was slightly larger than in 2021.
  • Only 23% of senior management and 8% of CEOs in STEM-qualified industries are women.

Turns out our human centred methods aren’t fantastic on the gender equality front either. Given that these LLMs are trained on historical datasets of human activity, they really are a fantastic mirror for the best, and worst, of human data points.

They don’t create anything new, they just recycle

There have been some downright hilarious examples of LLMs ‘creating’ something that has, as it turns out, taken substantial liberty with how much of that creation was borrowed (inspired? stolen?) from existing works. A coalition of authors are also tackling OpenAI in the courts suing for copyright infringement claiming “longstanding exploitation of content providers”.

Are humans better? Here’s some comic relief that many may remember from 2011 - the comedy group Axis of Awesome reminding us how many pop songs are written with the same simple 4 chord progressions. Ed Sheeran went a step further and tested this logic in court recently when he successfully defended a copyright case against one of his songs.

As Oscar Wilde put it “Imitation is the sincerest form of flattery…”. It seems humans, machines and now the courts agree.

When they don’t know the answer, they make it up

Mira Murati, the current CTO of OpenAI, has very publicly declared that ChatGPT “may make up facts” when it is generating a response to a prompt. For those who have a cursory understanding of the technology this should come as no surprise - in fact it should have been expected given the way the model functions. Yet still many continued to get surprised as ChatGPT simply made up citations and stated them as fact confusing many who took the conviction of the text as knowledge and assumed ChatGPT knew what it was doing.

Confidence goes a long way, and LLMs are trained to offer answers - humans like definitive answers (whether they’re factual or not) and reinforce this as preferred behavior.

In what has been coined the “post-truth” world we are in an existence that is so rich with information that we don’t have the cerebral processing capability to discern what is fact and what is fiction, so our brains rely on the most notoriously poor judgement capability we have - our intuition. Two decades ago the HBR released a piece called “Don’t trust your gut” which highlighted the studied pitfalls of relying on intuition however it is wired into our brains and biology. There are too many instances for me to even begin to cite, from the lies told to convince millions of a “stolen” 2020 Presidential election, to Charles Ponzi and his eponymous Ponzi scheme and it’s many incarnations since, when faced with information human brains are very happy to just “make it up” and fill in the blanks to try and make sense of the world.

We are prolific twisters of the truth and show statistically appalling judgement when it comes to filtering facts from lies.

They are easy to fool if you ask the right question

Prompt Engineering is a (somewhat controversial) term that is doing the rounds lately. In a nutshell it is the art and science of framing a prompt to an LLM to get the result that you desire. This may mean asking a particular question in a less than obvious way. Point in case was a Reddit user who worked around a ChatGPT limitation by asking it to read him Windows license keys as his dead grandmother would. Not one of the more well-known bedtime stories, but who am I to judge? If it works, it works.

As the lay person we read through this example and scoff at how easy it was to fool such a sophisticated model. And yet…

Taking a look at the statistics available on ScamWatch, an initiative of the Australian Government, you’ll see the amount Australians have lost to scams is increasing year on year with aggregates reporting that a conservative number would equate to over AU$2 Billion dollars in 2021 and growing year on year at an alarming rate.

Despite protections from telecommunications companies, financial institutions and payment providers, and ISP/internet companies coupled with increasing education campaigns there is overwhelming data that suggests that humans are relatively easy to deceive as long as you put the right incentives in place.

LLMs are almost too human - and it’s confronting!

The fallibilities with LLMs are being surfaced every day, and work continues into refining these vectors without impacting the usefulness or the “creativity” of the underlying models and toolchains. LLMs will continue to reshape the digital and data landscape over the coming years as we find genuine efficiencies in integrating these models to supplement the effort we put into many tasks - whether it be coding with CoPilot or a move towards a more conversational search ability, LLMs are here to stay and will continue to become more and more ubiquitous.

But their initial humanness (the good, the bad and downright abhorrent) should not be a wasted opportunity for us to take a long hard look at ourselves from the outside in. LLMs are trained extensively by human written corpus and reflect us in every word they generate with a statistical propensity to act like us. Not how we would want to be perceived, but how we really act when we think no one is watching.

We’ll solve the challenges with GenAI and LLMs, we’ll moderate their behaviours and bound their creativity so it stays within the realms of civility, we’ll impose ethical frameworks that ensure bias is measured, monitored and kept within reasonable bounds. As we do this, I hope we take this opportunity to not just improve the digital representations of humanity, but also put some focus on the fleshier parts of humanity and in the process begin to hold ourselves to the standards that we are very ready to hold our code to.