OpenAI has recently released the GPT-4o System Card, a research document that provides insights into the safety measures and risk evaluations conducted before the release of its latest model.
Before the public launch of GPT-4o in May, OpenAI employed external red teamers, who are security experts tasked with identifying vulnerabilities in a system, to assess potential risks associated with the model. The evaluation focused on risks such as unauthorized voice cloning, the generation of explicit or violent content, and the reproduction of copyrighted material. The findings of this evaluation have now been made public.
According to OpenAI’s risk framework, GPT-4o was categorized as having a “medium” level of risk. The evaluation considered four main categories: cybersecurity, biological threats, persuasion, and model autonomy. Except for persuasion, which highlighted that some writing samples from GPT-4o had the potential to be more persuasive than human-written text, the overall risks were deemed low.
The system card includes preparedness evaluations conducted by OpenAI’s internal team, as well as evaluations by external testers listed on OpenAI’s website, such as Model Evaluation and Threat Research (METR) and Apollo Research, both of which specialize in evaluating AI systems.
OpenAI has previously released system cards for other models like GPT-4, GPT-4 with vision, and DALL-E 3, following a similar testing and research approach. The release of the GPT-4o system card comes at a critical time for OpenAI, as the company has faced ongoing criticism regarding its safety practices, both from its own employees and public figures. The recent open letter from Sen. Elizabeth Warren and Rep. Lori Trahan raises concerns about whistleblower protection and safety reviews at OpenAI, highlighting previous safety-related issues within the company.
OpenAI’s release of system cards demonstrates its commitment to transparency and accountability, addressing concerns and providing insights into the safety evaluations conducted for its AI models.
In addition, OpenAI’s release of a highly capable multimodal model just before a US presidential election raises concerns about the potential risks of accidental misinformation spreading or malicious actors hijacking the model. While OpenAI aims to emphasize that they are testing real-world scenarios to prevent misuse, there have been increasing demands for greater transparency, not only regarding the model’s training data (such as whether it includes data from YouTube) but also its safety testing.
In California, where OpenAI and other prominent AI labs are located, State Senator Scott Wiener is working on a bill to regulate large language models, proposing restrictions that would hold companies legally accountable if their AI is used in harmful ways. If this bill is passed, OpenAI’s advanced models would need to undergo state-mandated risk assessments before being made available to the public.
However, the key takeaway from the GPT-4o System Card is that, despite the involvement of external red teamers and testers, much of the evaluation process relies on OpenAI to assess its own model’s safety. This highlights the importance of ongoing scrutiny and external oversight to ensure the responsible development and deployment of AI models.