U.S., U.K., and Global Partners Release Secure AI System Development Guidelines
27.11.23 AI The Hacker News
The U.K. and U.S., along with international partners from 16 other countries, have released new guidelines for the development of secure artificial intelligence (AI) systems.
"The approach prioritizes ownership of security outcomes for customers, embraces radical transparency and accountability, and establishes organizational structures where secure design is a top priority," the U.S. Cybersecurity and Infrastructure Security Agency (CISA) said.
The goal is to increase cyber security levels of AI and help ensure that the technology is designed, developed, and deployed in a secure manner, the National Cyber Security Centre (NCSC) added.
The guidelines also build upon the U.S. government's ongoing efforts to manage the risks posed by AI by ensuring that new tools are tested adequately before public release, there are guardrails in place to address societal harms, such as bias and discrimination, and privacy concerns, and setting up robust methods for consumers to identify AI-generated material.
The commitments also require companies to commit to facilitating third-party discovery and reporting of vulnerabilities in their AI systems through a bug bounty system so that they can be found and fixed swiftly.
The latest guidelines "help developers ensure that cyber security is both an essential precondition of AI system safety and integral to the development process from the outset and throughout, known as a 'secure by design' approach," NCSC said.
This encompasses secure design, secure development, secure deployment, and secure operation and maintenance, covering all significant areas within the AI system development life cycle, requiring that organizations model the threats to their systems as well as safeguard their supply chains and infrastructure.
The aim, the agencies noted, is to also combat adversarial attacks targeting AI and machine learning (ML) systems that aim to cause unintended behavior in various ways, including affecting a model's classification, allowing users to perform unauthorized actions, and extracting sensitive information.
"There are many ways to achieve these effects, such as prompt injection attacks in the large language model (LLM) domain, or deliberately corrupting the training data or user feedback (known as 'data poisoning')," NCSC noted.