Security and Safety of AI Systems
By Huzaifa Sidhpurwala, Senior Principal Product Security Engineer, Red Hat
It is hard to imagine any modern computer system that hasn’t been improved by the power of artificial intelligence (AI). For example, when you take a picture with your smartphone camera, on average more than twenty deep learning (DL) models spring into action, ranging from object detection to depth perception, all working in unison to help you take that perfect picture!
Business processes, productivity applications and user experiences can all be enhanced by using some form of AI, and few other technologies have grown with the same size, speed and reach. Like any other piece of technology, however, AI comes with its own risks, which, in this case, include security and safety and possibly even legal obligations. In this article, we’ll take a brief look at some of these safety and security concerns, particularly those involved with generative AI (gen AI), and how we can develop safer, more secure and more trustworthy AI systems.
Differentiating between security and safety
Like any computer system (hardware or software), AI systems can be used for nefarious purposes, such as jailbreaking, prompt injection, adversarial training and other things. AI systems bring a new paradigm to the industry, however—the concept of the safety of the output data. This is mainly because of the following:
- AI output is often generated based on previous training of the model, and the quality of the output depends on the quality of the data used in training. Well-known models take pride in using as much data as is available, which is often measured by the number of tokens used to train the model. The theory is that the more tokens used, the more effective the model’s training
- Output from the model may be used to help make business, user and technical decisions. This poses the risk of financial losses as well as potentially having safety and legal implications. For example, there is no shortage of insecure code on the internet, so any model trained on it runs the risk of generating insecure code as a result. If this generated code is used directly in a software project, it could become an entirely new kind of supply chain attack
While some aspects of AI security and safety are entangled, most safety frameworks tend to deal with them separately. Safety standards for computers are a relatively new paradigm for most companies, and we are still trying to wrap our heads around them.
Safety considerations when using AI models
In a nutshell, gen AI models work by predicting the next word in a sentence. Though these models have evolved to be much more advanced, they still fundamentally operate on this principle. This means there are some interesting things to consider when talking about AI safety.
Garbage in, garbage out
Garbage in, garbage out is a very basic principle of computing that is still applicable to AI models, but in a slightly different way. A gen AI model “learns” from a particular set of data in its training phase. Typically, this training phase is divided into two parts. The first part is the pre-training phase, where a large corpus of data is used, often obtained from the internet. The second part is the fine-tuning phase, where data that is specific to the model’s purpose is used to make the model better at a more focused task or set of tasks. Some models may go through more than two phases, depending on the model’s architecture and purpose.
As you might expect, training your model on data obtained in bulk from the internet—without filtering for sensitive, unsafe and offensive content—can have some unexpected and adverse results.
Models hallucinate
I often compare AI models to small children. When children don’t know the answer to a question, they will often make up an entirely false, but convincing story. Models are similar in a lot of ways, but the result can be more dangerous or damaging, particularly when models generate answers that can have financial, social or security implications.
Safety testing and benchmarking
While the AI industry is still in its very nascent stages, there have been some proposals for benchmarking standards that we think are interesting and worth paying attention to:
- The MLCommons AI Safety working group has released the MLCommons AI Safety v0.5 benchmark proof-of-concept (POC). The POC focuses on measuring the safety of large language models (LLMs) by assessing the models’ responses to prompts across multiple hazard categories
- The National Institute of Standards and Technology (NIST), under the United States Department of Commerce, has published an Artificial Intelligence Risk Management Framework (AI RMF 1.0). AI RMF discusses how to quantify and detect risks, as well as understand their manifestations, impacts and management
- Trusty AI is an open source project started by Red Hat that works to mitigate issues related to AI bias
Building guardrails
Guardrail applications and models use various methods to help make sure that the output of a model is in accordance with the set safety and security requirements. Various open source tools and projects exist that can help set up these guardrails. A guardrail is just another piece of software, however, and will come with its own risks and limitations. It is up to model creators to establish mechanisms to measure and benchmark the harmfulness of their models before putting them into production.
Why open source makes a difference
While the industry is still discussing what constitutes an open source model for AI and what that model should be, IBM and Red Hat are leading the way by implementing open standards and open data for the AI models we ship. This includes:
- IBM’s Granite foundation models, which ship with Red Hat Enterprise Linux (RHEL) AI, are pre-trained on open data. This means that all data sources are published and available for inspection. Several data scrubbing techniques are also used on the pre-training data to help filter out potentially sensitive, unsafe and offensive content before it is fed to the model
- Red Hat’s InstructLab project helps simplify the fine-tuning phase of model training. Among other things, this helps reduce the potential security and ethical issues with the model’s output. A considerable amount of recent research supports this theory. You can learn more in this article on the Google blog: Protecting users with differentially private synthetic training data
Red Hat is also a founding member of the AI Alliance. This is a collaborative network of companies, startups, universities, research institutions, government organisations and non-profit foundations that are at the forefront of AI technology, applications and governance. As part of this alliance, we are working to drive the creation of a truly open, safer and more secure AI environment—not only for our customers, but for the open source community as a whole.
Wrap up
Artificial intelligence is in its early stages of development, and it is essential for us to think about its security and safety now, rather than trying to bolt it on at later stages. Red Hat believes that this is one area of AI development where open source and open systems can make a profoundly important difference.