Real-Time Data Pipelines with Amazon Kinesis: Architecture and Tradeoffs

Why Kinesis?

When you need to ingest and process streaming data in real time — telemetry from IoT devices, clickstream events, application logs, financial transactions — Amazon Kinesis Data Streams is the AWS-native answer. It stores streamed data synchronously across three Availability Zones, supports data retention up to 365 days, and integrates natively with Lambda, Flink, and Firehose.

AWS describes the architecture as a stack of five logical layers, each composed of purpose-built components:

1. Streaming Sources      (IoT devices, applications, services)
2. Stream Ingestion       (Kinesis Data Streams, IoT Core)
3. Stream Processing      (Lambda, Kinesis Data Analytics / Flink)
4. Serving Layer          (DynamoDB, OpenSearch, Timestream)
5. Consumption Layer      (dashboards, alerts, downstream systems)

Each layer can scale independently, which is the core advantage over batch-oriented pipelines.

Provisioned vs. On-Demand Mode

Kinesis offers two capacity modes — the choice affects cost, scaling behavior, and operational overhead:

	Provisioned	On-Demand
Capacity	Fixed shards (1 MB/s write, 2 MB/s read per shard)	Auto-scales up to 200 MB/s write
Cost model	Pay per shard-hour	Pay per GB ingested and retrieved
Best for	Predictable, steady-state workloads	Variable or unpredictable traffic
Scaling	Manual shard splits/merges or auto-scaling via CloudWatch + Lambda	Automatic

For most production workloads, start with On-Demand to avoid under/over-provisioning. Move to Provisioned once you have a clear throughput baseline — it's cheaper at steady state. Per AWS best practices, monitor GetRecords.IteratorAgeMilliseconds — if it climbs consistently, you need more shards or faster consumers.

Ingestion Patterns

Direct put from producers:

import boto3, json

kinesis = boto3.client('kinesis', region_name='us-east-1')

def publish_events(events: list[dict], stream: str):
    records = [
        {
            'Data': json.dumps(e).encode(),
            'PartitionKey': e.get('device_id', 'default'),
        }
        for e in events
    ]
    # PutRecords batches up to 500 records per call — much cheaper than individual PutRecord
    return kinesis.put_records(Records=records, StreamName=stream)

Partition key selection determines shard distribution. Using a high-cardinality key like device_id or user_id distributes load evenly and keeps per-entity records ordered within a shard. A low-cardinality key (e.g., a single constant) sends everything to one shard and creates a hot partition.

IoT at scale: AWS recommends routing IoT device data through AWS IoT Core first. IoT Core handles device authentication, certificate management, and MQTT protocol termination, then fans data into Kinesis. This avoids putting AWS credentials on embedded devices.

Lambda as a Stream Consumer

Lambda integrates natively with Kinesis as an event source. Key configuration decisions:

# Lambda handler — receives a batch of Kinesis records
def handler(event, context):
    for record in event['Records']:
        payload = json.loads(record['kinesis']['data'])
        process(payload)
    return {'statusCode': 200}

Critical settings for production:

Bisect on error: splits a failing batch in half and retries each half separately, isolating bad records without blocking the whole shard
Maximum retry attempts: set a finite limit; otherwise a poison-pill record blocks a shard indefinitely
Destination on failure: route failed records to SQS or S3 for later inspection
Parallelization factor: 2–10 concurrent Lambda invocations per shard for higher throughput

Idempotency matters: Kinesis guarantees at-least-once delivery — producers may retry, and Lambda retries on error. Your consumer must handle duplicate records. Use a deduplication key (e.g., event UUID in DynamoDB with a TTL) if downstream systems can't absorb duplicates.

A Note on Data Quality

One subtle issue in telemetry pipelines: device clocks drift. AWS calls out that IoT devices may emit timestamps more than 15 minutes off — in either direction — due to internal clock errors or daylight savings transitions. If you're windowing on event timestamps in Flink or Kinesis Data Analytics, build in a late-data tolerance and use the Kinesis approximateArrivalTimestamp as a fallback when device timestamps look implausible.

Kinesis Firehose for the Archive Layer

Firehose sits alongside Kinesis Data Streams and handles the durable archive path with zero consumer code. It buffers records and delivers them to S3 (or OpenSearch, Redshift, Splunk) in configurable batches:

Kinesis Data Streams → Firehose → S3 (partitioned by year/month/day/hour)

S3 output partitioned by time is immediately Athena-compatible — you get ad-hoc SQL over months of raw event history with no ETL required.

Operational Monitoring

The metrics that matter:

Metric	What it tells you
`GetRecords.IteratorAgeMilliseconds`	Consumer lag — how far behind real-time your readers are
`WriteProvisionedThroughputExceeded`	Producers hitting shard write limits
`ReadProvisionedThroughputExceeded`	Consumers hitting shard read limits
Lambda `IteratorAge` (from event source mapping)	End-to-end processing lag

Set CloudWatch alarms on IteratorAgeMilliseconds > 60,000ms (1 minute behind) — that's the signal to add shards or optimize your consumer before lag compounds.

Security Baseline

Kinesis Data Streams supports server-side encryption with AWS KMS out of the box — enable it for any stream that carries sensitive data. For network isolation, use VPC interface endpoints so traffic between your VPC and Kinesis never traverses the public internet. Scope IAM policies to specific stream ARNs — producers should only have kinesis:PutRecords, consumers only kinesis:GetRecords and kinesis:GetShardIterator.

Sources:

Architectural Patterns for Real-Time Analytics with Kinesis Data Streams, Part 1 — AWS Big Data Blog
Build and Optimize a Real-Time Stream Processing Pipeline with Kinesis and Flink — AWS Big Data Blog
Real-Time Data Processing Pipeline with Kinesis and Lambda — Simple AWS Newsletter
Near Real-Time Processing with Kinesis, Timestream, and Grafana — AWS Database Blog