AI vs Everybody
It feels a little like AI is fighting a proxy battle with regulators. The number one weapon that regulators wield in this uncharted territory is security: in other words, does an AI know how to handle human data? A language model can't understand exactly how the words it's printing out are actually impacting our lives, can it? And so via the proxies of OpenAI, Facebook, and others who are representing the technology, we're learning quite how much this type of AI can - or can't - be contained.
I wrote this after researching the major security concerns that keep popping up - whether it's online in discussion, as part of a House of Lords inquiry, or because there are serious legal ramifications of a LLM's behaviour. And I look at some of the ways that governments are trying to regulate this innovative, but incredibly new, technology.
LLMs need to be secure inside and out
While the obvious cliche may be Hollywood-esque evil self-aware robots, in my opinion, current LLM security concerns are actually much more sinister. Let's first look at one of the most discussed security holes - that a malicious user is able to "jailbreak" the LLM by prompting it in such a way that it will respond with harmful or illegal answers. This has been a concern with chatbots since Microsoft's "Tay" back in 2016, but is even more concerning now that generative AI is being integrated seamlessly with user experiences today, like education, therapy, and healthcare.
We can then take a look at security risks in how the data is trained. Similar to "Tay", which learned from the then-Twitter masses about how to be racist, LLMs are often trained on the open internet and other highly available sources (when Elon Musk restricted API access to Twitter/X it was ostensibly to prevent these models from easily being able to ingest its conversations). But these data sources lack the filtering and cleansing that a more purpose-built model would require. Data sources may contain malicious, incorrect, or private/personal data that could be repurposed into a prompt response.
Data - the fuel for these LLMs - is also at risk of "poisoning", or intentional feeding of poor data in order to encourage the model to make poor decisions. There are repositories like HuggingFace which allow community models and datasets can be shared, but there are many examples of how poisoned datasets can be loaded with malicious intent.
Finally, like any code, LLMs are at risk of basic supply chain security risks, like insecure libraries and packages. In addition, deprecated or poisoned pre-trained models can open the door to security vulnerabilities. Many applications of LLMs (such as a chatbot) depend on other packages to function, all of which should be vetted and tracked in a software bill of materials or SBOM. This concept is not new - it was even documented in a US White House Executive Order in 2020 - but it is not yet a mandate for every organisation.
Worldwide regulation? Good luck with that
As I write this, government bodies are deciding how much or little to regulate AI and the companies that serve it to their customers (such as OpenAI, Google, and Microsoft). Some examples of the varying approaches as of January 2024:
- The European Commission proposed an EU regulatory framework for AI in December 2023. This still needs to be approved by the European Parliament, but some key areas include safeguards such as banning facial recognition databases, consumer rights, and fines for breaching the rules in the Act.
- China already has AI regulations in place (active since March 2022) that focus on how the technology is implemented and who maintains control of the technology. For example, algorithm recommendation technology and deep synthesis technologies (i.e. deepfakes) impose obligations on application developers who choose to use generative AI in their services.
- The UK does not have a separate AI regulation, but announced a “pro-innovation” approach to AI regulation, which uses existing laws enforced by existing regulations to consider principles such as safety, security, transparency, and accountability among others. A recent House of Lords Committee considering LLMs shared evidence provided by OpenAI around the future trajectories, future risks, regulation and copyright. Notably, OpenAI specifies that in order to “incorporate and represent the full diversity and breadth of human intelligence and experience”, training against current data - including pieces of content under copyright - is a necessity.
- In India, the government changed its mind after first indicating it would not regulate AI at all. This shows the desire for innovation as well as the relative immaturity of India’s regulatory frameworks when it comes to technologies such as AI - the current IT Act 2000 is 24 years old, and "doesn’t even have the word internet in it", per the Minister of State for Electronics and IT Rajeev Chandrasekhar. This is due to be replaced by the Digital India Act, which is in draft.
Phoebe's take
There are, of course, many other countries wrestling with the potential catastrophic risks of LLMs. Do they take a risk-averse approach across all applications such as the EU, an innovation-first approach like India and the UK, or a state-first approach like China? These regulations will impact the way that AI is developed and adopted across the world.
I believe that LLMs will be more highly regulated than they are today, and the net result will be more fine-tuned models that have stringent supply chain audits. It may limit innovation behaving in this way, but when LLMs will change the way that we work, live and play it's better to be safe than sorry.
And of course superpowers will want to control the foundation models off which many other LLMs are tuned. I expect this is going to be a major global cause for concern - less for actual use-cases but in a new kind of "cold war" - that the countries with the most powerful, ingenious, advanced LLMs are going to be able to innovate much faster than those who are restricting innovation to specific models or data sets.
What can I do?
What all of this taught me is that it isn't enough to just know how a LLM is developed or even how to use them - but that the policy is just as important as the technology. We all need to have awareness of how AI regulation is going to impact our lives both personally and professionally. I'll be writing more on this as more of the regulation gets passed and we see the impact.
Also, we're already seeing cases where ChatGPT is impacting legal cases, from copyright to completely made-up evidence. We're going to see a lot more of this because we haven't yet tuned our critical thinking to be able to understand if something was generated by a well-tuned prompt or an actual human. We need to learn discernment (or, more likely, enforce it) so that we don't fall prey to malicious attacks such as phishing or vishing.
And of course, the technology is never far away from my thoughts. We have to build security into every part of the LLM: from training data sets to prompt engineering to even how we teach people to use it. AI is an incredibly powerful tool that can automate a lot of the mundane heavy lifting, but it's up to us - the technologists, engineers, and industry leaders - to help shape the technology so that the rest of the world can use its "magic" without worrying about the potential risks. This is probably the area where I feel the most work is being done, and hopefully you are bringing a secure perspective to AI as you use it in your life!