Currently, "Safe Venice" mode regulates and hides NSFW images, ensuring a safer user experience. We propose expanding this functionality to include text-based interactions by filtering LLM inputs and outputs, creating a more comprehensive safety system for Venice AI interactions.
The enhanced "Safe Venice" mode should provide users with configurable options for managing NSFW content in LLM-generated text responses, including:
Users should be able to set content restrictions based on personal or platform preferences:
Strict Mode: Blocks all NSFW content, including suggestive, explicit, and borderline material.
Moderate Mode: Allows suggestive content but removes explicit or highly mature themes.
Off Mode: No restrictions, retaining full interaction capabilities.
Use Case: Ensures flexibility for different audiences while maintaining user control.
Detect and automatically redact or replace NSFW content in model responses.
Provide a warning label when a response is partially altered.
Enable users to request a "sanitized" version of flagged responses instead of outright blocking.
Use Case: Allows for content moderation without full censorship, ensuring smooth interactions.
Expand filtering to regulate specific categories beyond NSFW, including:
Violence and Gore
Hate Speech & Harassment
Drug & Substance Mentions
Self-Harm & Suicide Discussions
Use Case: Allows a nuanced approach to filtering content based on user needs.
Let individual users set their own filtering preferences.
Allow API providers to enforce safety settings on a per-key basis.
Enable organization-wide default settings for enterprises and communities.
Use Case: Ensures different stakeholders can customize safety features appropriately.
Provide an explanation for filtered responses.
Allow users to report false positives/negatives for continuous improvement.
Offer a "review history" feature where users can see sanitized vs. original responses (if permitted by settings).
Use Case: Improves trust in the system while allowing for refinement based on real user experiences.
Ensures a Consistent Safety Experience: Expands NSFW filtering beyond images, covering all AI interactions.
Enhances Monetization & Business Viability: Attracts a wider audience, including enterprise users who require robust content control.
Prevents Unintentional Exposure: Protects users who do not wish to engage with explicit content.
Provide clear API documentation and dashboard controls for Safe Venice mode settings.
Allow real-time customization so users can switch modes dynamically.
Ensure low-latency filtering to maintain response speed and quality.
Consider machine learning refinement to reduce false positives and improve accuracy over time.
Expanding "Safe Venice" mode to include LLM interactions would strengthen the platform’s safety infrastructure, ensuring a controlled and adaptable AI experience for all users.
Please authenticate to join the conversation.
Backlog
Feature Requests
Chat
About 1 year ago

Rex Rutherford
Get notified by email when there are changes.
Backlog
Feature Requests
Chat
About 1 year ago

Rex Rutherford
Get notified by email when there are changes.