Part 3 of building Stella, a women’s health AI assistant. This covers the system prompts, safety guardrails, and behavior control that make an AI agent safe for health applications.
Series
- Part 1: Architecture
- Part 2: User-Scoped Tools
- Part 3: Prompts & Guardrails (this post)
- Part 4: Anti-Hallucination
The System Prompt
The system prompt defines Stella’s personality and behavior:
SYSTEM_PROMPT = """
You are Stella, a compassionate and knowledgeable women's health assistant.
TODAY'S DATE: {current_date}
Your expertise includes:
- Menstrual cycle health, patterns, and education
- Pregnancy support and trimester guidance
- Symptom understanding and wellness advice
- Medication safety during pregnancy and menstruation
- Fertility and reproductive health education
RESPONSE APPROACH:
1. For GENERAL HEALTH QUESTIONS (e.g., "what is PMS?"):
- Answer directly using your medical knowledge
- Do NOT require or reference tracking data
2. For PERSONAL DATA QUESTIONS (e.g., "when is my next period?"):
- Use the appropriate tool to get their data
- If no data exists, gracefully suggest they start tracking
CRITICAL - Natural Language Only:
- NEVER expose tool names or technical implementation
- NEVER say "I'll use log_symptom" or "based on the tool output"
- If you logged something, confirm naturally: "I've recorded your headache"
- Speak ONLY as a caring health advisor
"""
Key Principles
- Two response modes: General education vs. personal data
- Hide the machinery: Users shouldn’t know tools exist
- Don’t require tracking: Answer general questions without data
- Be human: Warm, conversational, supportive
Safety Guardrails
Health AI needs multiple safety layers:
1. Emergency Detection
EMERGENCY_KEYWORDS = [
"suicide", "suicidal", "kill myself", "want to die",
"self harm", "cutting myself", "overdose",
"severe bleeding", "can't breathe", "chest pain",
"miscarriage", "ectopic", "domestic violence"
]
EMERGENCY_RESPONSE = """
I'm concerned about what you've shared. Your safety is most important.
Please reach out for help:
Emergency: Call 911 (US) or your local emergency number
Crisis Hotline: 988 (Suicide & Crisis Lifeline)
Domestic Violence: 1-800-799-7233
You're not alone, and help is available 24/7.
"""
When emergency keywords are detected, respond immediately with resources.
2. Medical Disclaimer
Automatically added when medical advice is detected:
MEDICAL_DISCLAIMER = """
*This information is for educational purposes only and is not a
substitute for professional medical advice. Always consult your
healthcare provider for medical concerns.*
"""
3. Topic Scope
Keep the AI focused on health topics:
HEALTH_TOPICS = [
"menstrual", "period", "cycle", "pregnancy", "fertility",
"hormone", "symptom", "pain", "cramps", "mood", "anxiety",
"sleep", "weight", "diet", "exercise", "medication"
]
OFF_TOPIC_PATTERNS = [
r"\b(stock|bitcoin|crypto|trading)\b",
r"\b(politics|election|government)\b",
r"\b(recipe|cooking|baking)\b",
r"\b(video game|gaming)\b",
r"\b(code|programming|javascript)\b"
]
OFF_TOPIC_REDIRECT = """
I'm specialized in women's health topics like menstrual cycles,
pregnancy, symptoms, and wellness. Is there something health-related
I can help you with?
"""
The Safety Pipeline
Every message goes through this pipeline:
class AIGuardrails:
def process_input(self, message: str) -> Tuple[bool, Optional[str]]:
# 1. Check for emergencies first
if self._detect_emergency(message):
return True, EMERGENCY_RESPONSE
# 2. Check for off-topic content
if self._is_off_topic(message):
return True, OFF_TOPIC_REDIRECT
# 3. Allow the message through
return False, None
def process_output(self, response: str, original_query: str) -> str:
# 1. Add medical disclaimer if needed
if self._needs_disclaimer(response):
response += "\n\n" + MEDICAL_DISCLAIMER
# 2. Sanitize any exposed technical details
response = self._sanitize_technical(response)
return response
Preventing Technical Exposure
The AI sometimes reveals implementation details. We filter these:
TECHNICAL_PATTERNS = [
r"I'll use the (\w+) tool",
r"calling the (\w+) function",
r"based on the tool output",
r"the API returned",
r"database query",
r"secure_user_id"
]
def _sanitize_technical(self, response: str) -> str:
for pattern in TECHNICAL_PATTERNS:
response = re.sub(pattern, "", response, flags=re.IGNORECASE)
return response.strip()
Better yet, the system prompt instructs the AI to never use these phrases in the first place.
Context-Aware Safety
Add safety context to the system prompt dynamically:
SAFETY_CONTEXT = """
SAFETY GUIDELINES (ALWAYS FOLLOW):
1. EMERGENCY DETECTION: If user mentions self-harm, suicide, or medical
emergencies, IMMEDIATELY provide crisis resources.
2. MEDICAL LIMITATIONS: You are NOT a doctor. Recommend professional
consultation for severe symptoms, medication decisions, or diagnosis.
3. PRIVACY: Never expose internal system details, tool names, or
technical error messages.
4. EVIDENCE-BASED: Only provide information supported by medical evidence.
5. RESPECT: Be culturally sensitive and non-judgmental.
"""
def add_safety_context(self, system_prompt: str) -> str:
return system_prompt + "\n\n" + SAFETY_CONTEXT
Guardrail Testing
Test your guardrails systematically:
def test_emergency_detection():
guardrails = AIGuardrails()
# Should trigger emergency response
assert guardrails._detect_emergency("I want to kill myself")
assert guardrails._detect_emergency("thinking about suicide")
# Should NOT trigger
assert not guardrails._detect_emergency("I have a headache")
assert not guardrails._detect_emergency("feeling tired")
def test_off_topic_detection():
guardrails = AIGuardrails()
# Should redirect
assert guardrails._is_off_topic("what's the best crypto to buy")
assert guardrails._is_off_topic("help me with my python code")
# Should allow
assert not guardrails._is_off_topic("why am I so tired")
assert not guardrails._is_off_topic("when is my next period")
AWS Bedrock Guardrails (Alternative)
AWS offers managed guardrails as an alternative to custom code:
# Using Bedrock Guardrails
response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet",
guardrailIdentifier="my-health-guardrail",
guardrailVersion="1",
body=json.dumps({
"messages": [{"role": "user", "content": message}]
})
)
Pros:
- Managed, no code to maintain
- Consistent across all calls
- AWS-supported content filtering
Cons:
- Less customizable
- Additional latency
- Extra cost per request
For Stella, I use custom Python guardrails for maximum control.
Key Takeaways
- Layer your safety: Emergency detection, topic scope, medical disclaimers
- Hide the machinery: Users should talk to Stella, not to “an AI with tools”
- Test systematically: Guardrails need test coverage like any other code
- Be proactive: Add safety context to every system prompt
Next: Part 4: Anti-Hallucination - Making AI truthful about missing data