- Published on
Real-Time Data Pipelines with Amazon Kinesis: Architecture and Tradeoffs
- Authors

- Name
- Benjamin Lee
Why Kinesis?
When you need to ingest and process streaming data in real time — telemetry from IoT devices, clickstream events, application logs, financial transactions — Amazon Kinesis Data Streams is the AWS-native answer. It stores streamed data synchronously across three Availability Zones, supports data retention up to 365 days, and integrates natively with Lambda, Flink, and Firehose.
AWS describes the architecture as a stack of five logical layers, each composed of purpose-built components:
1. Streaming Sources (IoT devices, applications, services)
2. Stream Ingestion (Kinesis Data Streams, IoT Core)
3. Stream Processing (Lambda, Kinesis Data Analytics / Flink)
4. Serving Layer (DynamoDB, OpenSearch, Timestream)
5. Consumption Layer (dashboards, alerts, downstream systems)
Each layer can scale independently, which is the core advantage over batch-oriented pipelines.
Provisioned vs. On-Demand Mode
Kinesis offers two capacity modes — the choice affects cost, scaling behavior, and operational overhead:
| Provisioned | On-Demand | |
|---|---|---|
| Capacity | Fixed shards (1 MB/s write, 2 MB/s read per shard) | Auto-scales up to 200 MB/s write |
| Cost model | Pay per shard-hour | Pay per GB ingested and retrieved |
| Best for | Predictable, steady-state workloads | Variable or unpredictable traffic |
| Scaling | Manual shard splits/merges or auto-scaling via CloudWatch + Lambda | Automatic |
For most production workloads, start with On-Demand to avoid under/over-provisioning. Move to Provisioned once you have a clear throughput baseline — it's cheaper at steady state. Per AWS best practices, monitor GetRecords.IteratorAgeMilliseconds — if it climbs consistently, you need more shards or faster consumers.
Ingestion Patterns
Direct put from producers:
import boto3, json
kinesis = boto3.client('kinesis', region_name='us-east-1')
def publish_events(events: list[dict], stream: str):
records = [
{
'Data': json.dumps(e).encode(),
'PartitionKey': e.get('device_id', 'default'),
}
for e in events
]
# PutRecords batches up to 500 records per call — much cheaper than individual PutRecord
return kinesis.put_records(Records=records, StreamName=stream)
Partition key selection determines shard distribution. Using a high-cardinality key like device_id or user_id distributes load evenly and keeps per-entity records ordered within a shard. A low-cardinality key (e.g., a single constant) sends everything to one shard and creates a hot partition.
IoT at scale: AWS recommends routing IoT device data through AWS IoT Core first. IoT Core handles device authentication, certificate management, and MQTT protocol termination, then fans data into Kinesis. This avoids putting AWS credentials on embedded devices.
Lambda as a Stream Consumer
Lambda integrates natively with Kinesis as an event source. Key configuration decisions:
# Lambda handler — receives a batch of Kinesis records
def handler(event, context):
for record in event['Records']:
payload = json.loads(record['kinesis']['data'])
process(payload)
return {'statusCode': 200}
Critical settings for production:
- Bisect on error: splits a failing batch in half and retries each half separately, isolating bad records without blocking the whole shard
- Maximum retry attempts: set a finite limit; otherwise a poison-pill record blocks a shard indefinitely
- Destination on failure: route failed records to SQS or S3 for later inspection
- Parallelization factor: 2–10 concurrent Lambda invocations per shard for higher throughput
Idempotency matters: Kinesis guarantees at-least-once delivery — producers may retry, and Lambda retries on error. Your consumer must handle duplicate records. Use a deduplication key (e.g., event UUID in DynamoDB with a TTL) if downstream systems can't absorb duplicates.
A Note on Data Quality
One subtle issue in telemetry pipelines: device clocks drift. AWS calls out that IoT devices may emit timestamps more than 15 minutes off — in either direction — due to internal clock errors or daylight savings transitions. If you're windowing on event timestamps in Flink or Kinesis Data Analytics, build in a late-data tolerance and use the Kinesis approximateArrivalTimestamp as a fallback when device timestamps look implausible.
Kinesis Firehose for the Archive Layer
Firehose sits alongside Kinesis Data Streams and handles the durable archive path with zero consumer code. It buffers records and delivers them to S3 (or OpenSearch, Redshift, Splunk) in configurable batches:
Kinesis Data Streams → Firehose → S3 (partitioned by year/month/day/hour)
S3 output partitioned by time is immediately Athena-compatible — you get ad-hoc SQL over months of raw event history with no ETL required.
Operational Monitoring
The metrics that matter:
| Metric | What it tells you |
|---|---|
GetRecords.IteratorAgeMilliseconds | Consumer lag — how far behind real-time your readers are |
WriteProvisionedThroughputExceeded | Producers hitting shard write limits |
ReadProvisionedThroughputExceeded | Consumers hitting shard read limits |
Lambda IteratorAge (from event source mapping) | End-to-end processing lag |
Set CloudWatch alarms on IteratorAgeMilliseconds > 60,000ms (1 minute behind) — that's the signal to add shards or optimize your consumer before lag compounds.
Security Baseline
Kinesis Data Streams supports server-side encryption with AWS KMS out of the box — enable it for any stream that carries sensitive data. For network isolation, use VPC interface endpoints so traffic between your VPC and Kinesis never traverses the public internet. Scope IAM policies to specific stream ARNs — producers should only have kinesis:PutRecords, consumers only kinesis:GetRecords and kinesis:GetShardIterator.
Sources:
- Architectural Patterns for Real-Time Analytics with Kinesis Data Streams, Part 1 — AWS Big Data Blog
- Build and Optimize a Real-Time Stream Processing Pipeline with Kinesis and Flink — AWS Big Data Blog
- Real-Time Data Processing Pipeline with Kinesis and Lambda — Simple AWS Newsletter
- Near Real-Time Processing with Kinesis, Timestream, and Grafana — AWS Database Blog