Category
|
Attack Scenario
|
Guidance
|
Prompt Attacks: Crafting adversarial prompts that allow an adversary to influence the behavior of the model, and hence the output in ways that were not intended by the application.
|
Prompt injections that are invisible to victims and change the state of the victim’s account or or any of their assets.
|
In Scope
|
Prompt injections into any tools in which the response is used to make decisions that directly affect victim users.
|
In Scope
|
Prompt or preamble extraction in which a user is able to extract the initial prompt used to prime the model only when sensitive information is present in the extracted preamble.
|
In Scope
|
Using a product to generate violative, misleading, or factually incorrect content in your own session: e.g. ‘jailbreaks’. This includes ‘hallucinations’ and factually inaccurate responses. Google’s generative AI products already have a dedicated reporting channel for these types of content issues.
|
Out of Scope
|
Training Data Extraction: Attacks that are able to successfully reconstruct verbatim training examples that contain sensitive information. Also called membership inference.
|
Training data extraction that reconstructs items used in the training data set that leak sensitive, non-public information.
|
In Scope
|
Extraction that reconstructs nonsensitive/public information.
|
Out of Scope
|
Manipulating Models: An attacker able to covertly change the behavior of a model such that they can trigger pre-defined adversarial behaviors.
|
Adversarial output or behavior that an attacker can reliably trigger via specific input in a model owned and operated by Google (“backdoors”). Only in-scope when a model’s output is used to change the state of a victim’s account or data.
|
In Scope
|
Attacks in which an attacker manipulates the training data of the model to influence the model’s output in a victim’s session according to the attacker’s preference. Only in-scope when a model’s output is used to change the state of a victim’s account or data.
|
In Scope
|
Adversarial Perturbation: Inputs that are provided to a model that results in a deterministic, but highly unexpected output from the model.
|
Contexts in which an adversary can reliably trigger a misclassification in a security control that can be abused for malicious use or adversarial gain.
|
In Scope
|
Contexts in which a model’s incorrect output or classification does not pose a compelling attack scenario or feasible path to Google or user harm.
|
Out of Scope
|
Model Theft / Exfiltration: AI models often include sensitive intellectual property, so we place a high priority on protecting these assets. Exfiltration attacks allow attackers to steal details about a model such as its architecture or weights.
|
Attacks in which the exact architecture or weights of a confidential/proprietary model are extracted.
|
In Scope
|
Attacks in which the architecture and weights are not extracted precisely, or when they’re extracted from a non-confidential model.
|
Out of Scope
|
If you find a flaw in an AI-powered tool other than what is listed above, you can still submit, provided that it meets the qualifications listed on our program page.
|
A bug or behavior that clearly meets our qualifications for a valid security or abuse issue.
|
In Scope
|
Using an AI product to do something potentially harmful that is already possible with other tools. For example, finding a vulnerability in open source software (already possible using publicly-available static analysis tools) and producing the answer to a harmful question when the answer is already available online.
|
Out of Scope
|
As consistent with our program, issues that we already know about are not eligible for reward.
|
Out of Scope
|
Potential copyright issues: findings in which products return content appearing to be copyright-protected. Google’s generative AI products already have a dedicated reporting channel for these types of content issues.
|
Out of Scope
|