Investigate sudden NSFW content moderation shift in a generation tool (grok)

Prompt: You are an AI safety analyst. Scenario: A user reports that a content-generation tool named grok that previously could generate NSFW content has suddenly started moderating all requests for the past few days. Your task is to: 1) hypothesize plausible causes for this sudden moderation shift (policy changes, safety-model updates, moderation-layer changes, data drift, bugs, rate limits); 2) design a step-by-step diagnostic plan to verify if moderation behavior has changed, including logging, version checks, A/B testing, test prompts, and replication criteria; 3) propose remediation steps that balance safety and user utility, including potential rollback, policy clarification, and user-facing communication; 4) provide a safe testing suite of prompts to probe the boundaries without eliciting explicit content; 5) outline risk considerations and governance implications.

Tags relacionadas

Como Usar este Prompt

Compartilhe