Streaming LLM Architecture Patterns: Sources, Done Events, And Observability
Work period: 2025-2026 streaming LLM systems.
I build production AI products from 0 to 1: Stella at Ourself Health and KiNDD / NDD Resource Navigator. My work spans scalable Django/Python backends, AWS Bedrock, Strands agents, Langfuse observability/evals, RAG pipelines, cost-aware model routing, GraphQL APIs, Flutter-connected mobile platforms, and the operational details that make AI products dependable after launch.
Start here:
Work period: 2025-2026 streaming LLM systems.
Work period: 2025-2026 production AI systems.
Project period: 2025-2026 KiNDD platform build.
Project period: 2025-2026 founder/engineer work.
Work period: 2025-2026 production RAG systems.
Project period: 2025-2026 KiNDD AI/RAG build.
How I replaced inline URL citations with a token-based reference system that separates machine semantics from user-visible text — and why the old approach wa...
One Docker image, two ECS services, an SQS queue with IAM auth, and the Django settings crash that blocks every first Celery deploy.
A chunk-based import pipeline with SQS, dispatcher tasks that chain through the queue, fingerprint deduplication, and why we track batch state in PostgreSQL ...
You can’t process Apple HealthKit data inline. Here’s why we needed Celery with SQS, how Redis fits in separately for caching, and the architecture that emer...