Back
Cloud Engineering

Cloud Governance Platform

Cloud Governance Platform architecture diagram

Summary

(7 min read)

An enterprise-grade AWS governance platform built across a five-account AWS Organization (Management, Governance, Dev, Staging, Prod). Implements Config Rules for compliance evaluation, EventBridge for event routing, and a Lambda-based remediation engine that assumes least-privilege cross-account roles to apply safe fixes — with production explicitly protected by an allowlist.

Project Snapshot

My Role

Cloud Engineer & Solutions Architect

Duration

4 weeks · 2024

Context

Personal Lab Project

Outcome

5-account org · auto-remediation for LOW severity · 100% IaC

Stack

AWS OrganizationsIAM Identity CenterAWS ConfigEventBridgeLambdaDynamoDBSNSCloudWatchKMSTerraform

The Problem

Context

AWS accounts without governance guardrails drift into non-compliance quickly. Public S3 buckets, open security groups, untagged resources, and idle expensive instances are endemic in unmanaged accounts.

The Pain

Manual compliance checks are reactive and inconsistent. By the time a public S3 bucket is found, data may already be exposed. Security group drift goes unnoticed for months.

Why It Mattered

Compliance failures in cloud environments cause data exposure, unexpected costs, and failed audits. Organizations need automated guardrails that run continuously — not quarterly manual reviews.

Goals & Requirements

Technical Goals

  • Multi-account Config deployment via Terraform across all five accounts
  • Compliance rules: required tagging, public S3 detection, open SSH/RDP detection, idle EC2
  • Event-driven remediation pipeline with severity-based routing
  • Cross-account remediation with least-privilege assume-role pattern
  • Production explicitly protected — no auto-remediation without allowlist entry
  • Every remediation action logged with resource ARN, action, and outcome

Constraints

  • Governance account as Config delegated admin via AWS Organizations API
  • All IaC — zero manual Config or IAM console steps
  • Remediation engine must be fully auditable

Architecture Design

Five-account AWS Organization with a dedicated Governance account as Config delegated admin. Config Aggregator centralizes compliance data from all accounts. EventBridge routes compliance change events to a Policy Engine Lambda that evaluates severity and triggers the appropriate path: auto-remediate (LOW), notify (MEDIUM), or log-only (HIGH).

Architecture Diagram

Scroll horizontally on smaller screens to view full diagram

Component Breakdown

AWS Organizations + Config Aggregator

Centralizes Config compliance data from all five member accounts into the Governance account

AWS Config Rules

Evaluates resources continuously against compliance rules — tagging, public S3, open ports, idle EC2

EventBridge

Routes Config compliance change events to the Policy Engine Lambda for severity evaluation

Policy Engine Lambda

Evaluates rule severity, routes to remediation, SNS notification, or DynamoDB log-only path

Remediation Lambda

Assumes CloudGovernanceRemediationRole in target account and applies the safe fix — every action logged

DynamoDB

Compliance state store — tracks remediation history, current state, and outcome per resource

Key Design Decisions

Severity-based routing instead of auto-remediate everything

Auto-remediating HIGH severity issues (deleting a production security group, closing a port used by a service) is too dangerous. Severity tiers allow safe automation where appropriate and human review where judgment is required.

Cross-account assume-role for remediation

The remediation Lambda runs in the Governance account but assumes a scoped CloudGovernanceRemediationRole in the target account. Least-privilege, auditable in CloudTrail, and revocable per account without touching the remediation engine.

Implementation Breakdown

01

Multi-Account Config Deployment

AWS Config deployed into every member account via Terraform for-each, with the Governance account registered as delegated admin.

  • Terraform for-each over account list — one Config recorder + delivery channel per account
  • Governance account registered as Config delegated admin via AWS Organizations
  • Config Aggregator in Governance account with authorization in each member account
  • Dedicated S3 bucket in Governance account for Config snapshots and history
02

Policy Engine & Remediation

EventBridge rule captures Config compliance change events and routes to the Policy Engine Lambda for severity evaluation and action routing.

  • Policy Engine maps Config rule name to severity tier (LOW / MEDIUM / HIGH)
  • LOW: immediate Remediation Lambda invocation with target account and resource details
  • MEDIUM: SNS notification to ops channel with resource details and recommended action
  • HIGH: DynamoDB log entry only — human review required
  • Production account on an explicit allowlist — LOW items require manual override for prod

Challenges & Solutions

#1Terraform bootstrap ordering for Organizations + delegated admin

The Problem

AWS Organizations APIs have eventual consistency delays. Registering the Governance account as Config delegated admin immediately after creating the org structure failed intermittently with 'account not found' errors.

The Fix

Added explicit depends_on relationships in Terraform and a time_sleep resource to allow Organizations propagation before the delegated admin registration. Made all resources idempotent — safe to re-apply after any failure.

Results & Impact

A production-ready governance framework demonstrating how organizations automate AWS compliance at scale across multiple accounts.

Before → After

Compliance Visibility

Manual quarterlyContinuous real-time
Always-on

LOW severity remediation

Manual ticket, daysAutomatic < 5 min
Zero-touch

Infrastructure

Manual console100% Terraform
Fully reproducible

Business Outcome

A reference platform for any organization needing automated AWS compliance across multiple accounts — demonstrates the full governance architecture from Config rules through remediation to auditability.

Reflections

Would Do Differently

  • 01Use AWS Config Conformance Packs instead of individual rules — better grouping, built-in reporting, and easier to share across accounts
  • 02Add a custom compliance dashboard UI from day one rather than relying on the Config Aggregator's built-in views

Key Takeaways

  • 01Severity-based routing is the key architectural decision — it's what makes auto-remediation safe enough to actually run in production accounts
  • 02Cross-account assume-role patterns are the foundation of any multi-account AWS architecture — getting comfortable with them early pays dividends across every subsequent project

Next Project

Production Nomad Cluster

Platform Engineering

Production Nomad Cluster

Thanks for Reading