Published on

Real-Time Data Pipelines with Amazon Kinesis: Architecture and Tradeoffs

Authors
  • avatar
    Name
    Benjamin Lee
    Twitter

Why Kinesis?

When you need to ingest and process streaming data in real time — telemetry from IoT devices, clickstream events, application logs, financial transactions — Amazon Kinesis Data Streams is the AWS-native answer. It stores streamed data synchronously across three Availability Zones, supports data retention up to 365 days, and integrates natively with Lambda, Flink, and Firehose.

AWS describes the architecture as a stack of five logical layers, each composed of purpose-built components:

1. Streaming Sources      (IoT devices, applications, services)
2. Stream Ingestion       (Kinesis Data Streams, IoT Core)
3. Stream Processing      (Lambda, Kinesis Data Analytics / Flink)
4. Serving Layer          (DynamoDB, OpenSearch, Timestream)
5. Consumption Layer      (dashboards, alerts, downstream systems)

Each layer can scale independently, which is the core advantage over batch-oriented pipelines.

Provisioned vs. On-Demand Mode

Kinesis offers two capacity modes — the choice affects cost, scaling behavior, and operational overhead:

ProvisionedOn-Demand
CapacityFixed shards (1 MB/s write, 2 MB/s read per shard)Auto-scales up to 200 MB/s write
Cost modelPay per shard-hourPay per GB ingested and retrieved
Best forPredictable, steady-state workloadsVariable or unpredictable traffic
ScalingManual shard splits/merges or auto-scaling via CloudWatch + LambdaAutomatic

For most production workloads, start with On-Demand to avoid under/over-provisioning. Move to Provisioned once you have a clear throughput baseline — it's cheaper at steady state. Per AWS best practices, monitor GetRecords.IteratorAgeMilliseconds — if it climbs consistently, you need more shards or faster consumers.

Ingestion Patterns

Direct put from producers:

import boto3, json

kinesis = boto3.client('kinesis', region_name='us-east-1')

def publish_events(events: list[dict], stream: str):
    records = [
        {
            'Data': json.dumps(e).encode(),
            'PartitionKey': e.get('device_id', 'default'),
        }
        for e in events
    ]
    # PutRecords batches up to 500 records per call — much cheaper than individual PutRecord
    return kinesis.put_records(Records=records, StreamName=stream)

Partition key selection determines shard distribution. Using a high-cardinality key like device_id or user_id distributes load evenly and keeps per-entity records ordered within a shard. A low-cardinality key (e.g., a single constant) sends everything to one shard and creates a hot partition.

IoT at scale: AWS recommends routing IoT device data through AWS IoT Core first. IoT Core handles device authentication, certificate management, and MQTT protocol termination, then fans data into Kinesis. This avoids putting AWS credentials on embedded devices.

Lambda as a Stream Consumer

Lambda integrates natively with Kinesis as an event source. Key configuration decisions:

# Lambda handler — receives a batch of Kinesis records
def handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['kinesis']['data'])
        process(payload)
    return {'statusCode': 200}

Critical settings for production:

  • Bisect on error: splits a failing batch in half and retries each half separately, isolating bad records without blocking the whole shard
  • Maximum retry attempts: set a finite limit; otherwise a poison-pill record blocks a shard indefinitely
  • Destination on failure: route failed records to SQS or S3 for later inspection
  • Parallelization factor: 2–10 concurrent Lambda invocations per shard for higher throughput

Idempotency matters: Kinesis guarantees at-least-once delivery — producers may retry, and Lambda retries on error. Your consumer must handle duplicate records. Use a deduplication key (e.g., event UUID in DynamoDB with a TTL) if downstream systems can't absorb duplicates.

A Note on Data Quality

One subtle issue in telemetry pipelines: device clocks drift. AWS calls out that IoT devices may emit timestamps more than 15 minutes off — in either direction — due to internal clock errors or daylight savings transitions. If you're windowing on event timestamps in Flink or Kinesis Data Analytics, build in a late-data tolerance and use the Kinesis approximateArrivalTimestamp as a fallback when device timestamps look implausible.

Kinesis Firehose for the Archive Layer

Firehose sits alongside Kinesis Data Streams and handles the durable archive path with zero consumer code. It buffers records and delivers them to S3 (or OpenSearch, Redshift, Splunk) in configurable batches:

Kinesis Data StreamsFirehoseS3 (partitioned by year/month/day/hour)

S3 output partitioned by time is immediately Athena-compatible — you get ad-hoc SQL over months of raw event history with no ETL required.

Operational Monitoring

The metrics that matter:

MetricWhat it tells you
GetRecords.IteratorAgeMillisecondsConsumer lag — how far behind real-time your readers are
WriteProvisionedThroughputExceededProducers hitting shard write limits
ReadProvisionedThroughputExceededConsumers hitting shard read limits
Lambda IteratorAge (from event source mapping)End-to-end processing lag

Set CloudWatch alarms on IteratorAgeMilliseconds > 60,000ms (1 minute behind) — that's the signal to add shards or optimize your consumer before lag compounds.

Security Baseline

Kinesis Data Streams supports server-side encryption with AWS KMS out of the box — enable it for any stream that carries sensitive data. For network isolation, use VPC interface endpoints so traffic between your VPC and Kinesis never traverses the public internet. Scope IAM policies to specific stream ARNs — producers should only have kinesis:PutRecords, consumers only kinesis:GetRecords and kinesis:GetShardIterator.


Sources: