4 minute read

When we started integrating Apple HealthKit into our women’s health platform, the initial approach was simple: Flutter reads HealthKit, POSTs JSON to a Django GraphQL mutation, Django writes to PostgreSQL. Done.

It lasted about a week.

The Problem with Inline Processing

HealthKit data is on-device only. There’s no server-side API — every piece of health data has to travel from the user’s iPhone to your backend via HTTP. That creates three problems that compound:

Volume. A user with an Apple Watch generates heart rate samples every few minutes, step counts continuously, sleep analysis nightly. Connect HealthKit for the first time and request a 30-day backfill? That’s potentially thousands of samples per data type.

Latency. Our GraphQL mutations were processing HealthKit batches synchronously. A 30-day backfill of heart rate data would hold the HTTP connection open for 15-30 seconds while Django ran update_or_create in a loop. The Flutter client would time out or the user would navigate away.

Reliability. If Django crashed mid-batch — OOM, database connection timeout, ECS task getting replaced — the entire sync was lost. Partial writes meant some records existed and some didn’t, with no clean way to resume.

The answer was obvious: stop processing inline, start processing in the background.

Why Celery with SQS

For a Django backend running on ECS Fargate with an AWS-native stack, SQS as the Celery broker was the natural choice. No infrastructure to manage, no Redis cluster to keep alive, and the cost model is perfect for bursty health data — you pay per message, and when nobody’s syncing, broker costs are zero.

The configuration is minimal:

CELERY_BROKER_URL = config('CELERY_BROKER_URL', default='sqs://')

That sqs:// with no credentials means Celery authenticates via the ECS task’s IAM role. No access keys in environment variables, no secrets to rotate.

What SQS Gives You

Durability. If your Celery worker crashes mid-task, the SQS message doesn’t disappear. With acks_late=True, the message stays in the queue until the worker confirms completion. It reappears after the visibility timeout expires and gets retried.

Scaling for free. SQS handles burst traffic — everyone syncing their morning workout at 8 AM — without you managing broker capacity.

Dead letter queues. After N failed retries, poison-pill messages move to a DLQ automatically. You can inspect and replay them.

What SQS Doesn’t Give You

No Flower. SQS doesn’t support the event protocol that Celery’s monitoring tools depend on. You can’t use Flower to watch task progress.

No celery inspect. You can’t query worker state or kill specific tasks remotely. Monitoring comes from CloudWatch.

15-minute delay ceiling. SQS caps message delay at 15 minutes. This doesn’t matter for HealthKit imports, but it limits what you can do with scheduled tasks.

Visibility timeout burns on prefetch. With SQS, the visibility timeout starts when the worker receives the message, not when it starts processing it. If your worker prefetches 10 messages but processes them sequentially, the last 9 burn through their visibility timeout while sitting in memory. This is why CELERY_WORKER_PREFETCH_MULTIPLIER = 1 is critical.

Where Redis Fits In (Separately)

Redis serves a completely different purpose in our stack: caching and rate limiting. It has nothing to do with Celery’s task queue.

REDIS_URL = config('REDIS_URL', default='')

if REDIS_URL:
    CACHES = {
        'default': {
            'BACKEND': 'django_redis.cache.RedisCache',
            'LOCATION': REDIS_URL,
        }
    }
else:
    CACHES = {
        'default': {
            'BACKEND': 'django.core.cache.backends.locmem.LocMemCache',
        }
    }

When REDIS_URL is set (UAT/PROD), Django’s cache backend uses ElastiCache Redis. When it’s not (local dev), it falls back to in-process memory cache.

The AI rate limiter uses django.core.cache — which means it gets Redis in production and LocMemCache in development. This is intentional: rate limiting is disabled in dev anyway (RATE_LIMITING_ENABLED = False), so the non-shared cache doesn’t matter.

The Architecture That Emerged

Three ECS services, one Docker image, two backing services:

┌─────────────────┐     ┌──────────────┐
│  ECS: api       │────▶│  SQS Queue   │
│  gunicorn       │     │  (broker)    │
│  desiredCount:2 │     └──────┬───────┘
└─────────────────┘            │
        │                      ▼
        │              ┌─────────────────┐
        │              │  ECS: worker    │
        │              │  celery worker  │
        │              │  desiredCount:1 │
        │              └─────────────────┘
        ▼
┌──────────────┐
│  ElastiCache │
│  Redis       │
│  (cache only)│
└──────────────┘

The API container calls .delay() to enqueue tasks to SQS. The worker container pulls from SQS and executes them. Redis handles caching and rate limiting, completely independent of the task queue.

We deliberately chose no result backendCELERY_RESULT_BACKEND = None. Import status is tracked in our own AppleHealthImportBatch model in PostgreSQL, which gives us richer state management than Celery’s result store would.

The Dedicated Queue

All HealthKit tasks route to a dedicated SQS queue: apple-health-imports.

CELERY_TASK_DEFAULT_QUEUE = 'apple-health-imports'
CELERY_TASK_ROUTES = {
    'api.tasks.*': {'queue': 'apple-health-imports'},
}

The queue uses predefined SQS URLs in production rather than letting Celery auto-create queues. This is important because SQS queue creation requires IAM permissions you might not want your task role to have.

Next: The Sync Pipeline

In Part 2, I’ll cover the actual HealthKit import implementation: the 2,300-line import service, chunk-based batch processing, fingerprint deduplication, and the dispatcher pattern that chains chunks through SQS.