A
Arun's Blog
All Posts

Weekly AWS Security Posture Pipeline with Prowler, ECS Fargate, and Claude

|13 min read|
AWSSecurityAutomation
TL;DR

I built a weekly security posture pipeline for a multi-account AWS Organization. Every Sunday at 02:00 UTC, an EventBridge cron fires an ECS Fargate task that loops over each member account, assumes a read-only ProwlerScanRole, runs Prowler v5, and writes OCSF findings to S3. A .complete marker on S3 triggers a Lambda that parses the findings, sends a sample to Claude for a narrative summary, creates a per-week child page on Confluence (Week 2026-W21, Week 2026-W22, ...), and emails the rollup to a small leadership group via SES (cross-account assume-role into the management account to escape the SES sandbox). Total cost: ~$10-25/month.

Why I Built It

I needed a weekly, narrative-style security posture report across an AWS Organization. AWS-native posture tools (Security Hub, Config) are good for continuous detection, but they leave you staring at a wall of findings without a "what's the headline this week" rollup, no week-over-week diff, and no place where leadership can read a coherent story.

What I wanted:

  • A scheduled scan that runs against every member account in the org
  • Structured findings stored in S3 for ad-hoc lookback queries
  • A Lambda that sends the findings to an LLM for summarization
  • A weekly child page on Confluence (Week 2026-W21, Week 2026-W22, ...) so the history is browsable
  • A clean email to a small leadership group via SES (not SNS, because the SNS confirmation friction and AWS-styled sender feel cheap)
  • Self-monitoring CloudWatch alarms so the pipeline tells me when it is broken, not just when AWS posture is
Scope note

This post covers the Prowler-only flow: ECS Fargate runs Prowler, S3 stores OCSF JSON, a Lambda summarizes to Confluence + SES email. AWS Security Hub is a parallel posture system I run alongside this, but it's its own writeup.

Architecture at a Glance

Three accounts in play, all in the same region:

Role in the design Account ID (example) What lives there
Org management 111111111111 Two CloudFormation StackSets that propagate the read-only scan role + a verified SES sending domain identity
Logging / runtime 222222222222 Everything that runs: VPC, ECS Fargate cluster, Lambda, S3 findings bucket, CloudWatch alarms, Secrets Manager
Member workloads 333333333333, 444444444444, ... Just the per-account ProwlerScanRole (read-only, deployed via StackSet)

Cross-account assume-role is the connective tissue:

  • The Logging account's ECS task assumes ProwlerScanRole in each member to actually scan it.
  • The Logging account's Lambda assumes a sender role in mgmt to send via SES (because mgmt has the verified domain and is out of the SES sandbox).

The weekly run flow:

Sun 02:00 UTC   ->  EventBridge cron rule
                ->  ECS RunTask (Fargate, 2 vCPU / 8 GB)
                ->  for each member account:
                     poetry run prowler aws --role <full ARN>
                       --output-modes csv json-ocsf html
                       --output-bucket-no-assume <bucket>
                       --output-directory <week>/<run-id>/<acct>
                ->  drop <week>/<run-id>/run.complete marker
                ->  S3 ObjectCreated event (filter_suffix=.complete)
                ->  Lambda claude-summary
                     - load OCSF JSONs, filter status_code=FAIL, normalize
                     - load prior week's findings for diff
                     - call Anthropic API with sampled findings + counts
                     - render markdown -> Confluence storage XHTML
                     - POST or PUT child page "Week YYYY-Wnn" under parent
                     - STS AssumeRole into mgmt, SES SendEmail to leadership group

The Cross-Account Scan Role

In every member account I deploy a single IAM role, ProwlerScanRole, with SecurityAudit and ViewOnlyAccess managed policies. The trust policy is the slightly tricky part. It has to allow the Logging task role to assume it, but the Logging task role might not exist yet on the first deploy.

The trick: use the Logging account root as the trust principal (so IAM doesn't validate the specific role exists at create time), then constrain via an aws:PrincipalArn condition pinned to the eventual scanner role ARN. This avoids the chicken-and-egg with the Terraform-managed task role and keeps the policy tight.

Deployed as a service-managed StackSet from mgmt against the org root OU. Service-managed StackSets exclude the org management account by default, so the same template gets deployed as a regular CFN stack directly in mgmt.

The ECS Task Definition

The Prowler container image is prowlercloud/prowler:5.13.0. A few things I learned the hard way:

  • The image's ENTRYPOINT is poetry run prowler, not prowler. The binary lives inside a Poetry virtualenv at /home/prowler, not on the global PATH.
  • AWS CLI is not installed in the image. Don't write a shell script that calls aws s3 cp. Use Prowler's native --output-bucket-no-assume instead, and use boto3 (which is in Prowler's venv) for the marker drop.
  • Prowler v5 flag changes from v4: --aws-account-id is gone (loop yourself), --role takes a full ARN not a role name, --output-modes is space-separated not comma-separated, json-ocsf is the v5 native JSON (the older json mode is deprecated).

The shape of the container command, in pseudocode: loop the target account list, invoke Prowler against each with a fully-qualified role ARN, write the OCSF JSON to a per-week S3 prefix.

for ACCT in $ACCOUNTS; do
  poetry run prowler aws \\
    --role "arn:aws:iam::$ACCT:role/$ROLE_NAME" \\
    --output-modes csv json-ocsf html \\
    --output-bucket-no-assume "$BUCKET" \\
    --output-directory "$PREFIX/$ACCT"
done

The actual Terraform task definition wraps this in a bash heredoc inside an HCL heredoc, computes RUN_ID and WEEK_TAG from date -u, and drops a .complete marker via a one-line boto3 call after the loop finishes.

Quoting Trap

Putting Python multi-line code inside a bash heredoc inside a Terraform heredoc has subtle quoting issues. Use single-line python -c with string concatenation instead of f-strings to dodge nested-quote problems.

Region Scope Is a Real Knob

A naive 6-region full scan against a heavy account will take hours. Prowler v5 runs about 571 checks. Multiplied by 6 regions x 16 accounts and you get tens of thousands of API roundtrips per run. My first attempt hung at ~2.5 hours stuck on the first account.

Two options:

  1. Cut region scope to the SCP-allowed minimum (I did this). Most posture findings are not region-specific (IAM, S3, CloudTrail, root MFA). Two regions catches the bulk.
  2. Parallelize with one Fargate task per account. Needs a fan-out orchestrator. I deferred this. If your org has accounts each with sprawling resources, do it on day one.

S3 Layout and Lambda Trigger

s3://<findings-bucket>/
  +-- 2026-W21/
      +-- 2026-05-17T02-01-45Z/
          +-- 333333333333/
          |   +-- csv/prowler-output-333333333333-...csv
          |   +-- html/prowler-output-333333333333-...html
          |   +-- json-ocsf/prowler-output-333333333333-...ocsf.json
          |   +-- compliance/...
          +-- 444444444444/
          |   +-- ...
          +-- run.complete   <- S3 event triggers Lambda

The bucket notification on the Lambda uses filter_suffix = ".complete" so the function fires once per weekly run, not on every finding file. Without that filter, you'd get a Lambda invocation per OCSF JSON file (hundreds per run) and waste the parallel work.

Lambda: OCSF Parsing

Prowler v5 emits OCSF JSON (Open Cybersecurity Schema Format) by default. The schema is verbose. Two things to know:

  • The JSON file is a flat list of finding objects.
  • Each finding has both severity (titlecase string like "High") and severity_id (integer 1-6). Prowler v4-style code that does f.get("severity").upper() works against severity, but you have to filter. OCSF includes both PASS and FAIL findings in the same file.

The loader paginates the week's prefix, filters to *.ocsf.json, drops anything where status_code != "FAIL", and normalizes each finding down to the fields the prompt actually needs (uid, title, severity, account, region, top resources, status detail). In my org this drops a ~622 MB raw dump to about 18,000 actionable FAIL findings, plenty for the LLM to digest.

Lambda Memory Cliff Is Real

OCSF JSON for ~15 accounts of a real org parses to ~1.5 GB in Python. I hit OOM at 512 MB Lambda memory and had to bump to 3008 MB. Watch your Memory used metric in CloudWatch and size accordingly.

Lambda: The Claude Prompt

The prompt sends Claude four things: severity counts for the current run, severity counts from the previous week (for diff), totals, and a sample of up to 50 representative findings (top-down by severity tier). Then it asks for fixed sections: Headline, What's new this week, Resolved this week, Top open issues, Trends, Action items.

Two things matter in the prompt:

  1. Tell Claude what markdown constructs the converter supports. Otherwise it emits tables, blockquotes, and HTML that render literally on Confluence. My converter only handles headers, bullets, numbered lists, bold, inline code, and code fences. Anything outside that list has to be banned in the prompt.
  2. Cap the sample size. 50 representative findings produce a small (~12 KB) JSON prompt. Big enough to give meaningful coverage, small enough to fit in any reasonable Claude context.

Markdown to Confluence Storage XHTML

Confluence Cloud accepts a representation called "storage" which is basically a constrained subset of XHTML. I wrote a small converter that handles # / ## / ### headers, - and * bullets, 1. numbered lists, triple-backtick code fences, **bold**, *italic*, `inline code`, and [text](url) links.

Subtle bug I hit while writing it: the italic regex \*([^*]+)\* happily matches the *:* inside a backticked IAM policy snippet like `*:*`. Result: a stray <em> wrapping the colon. The fix is to tokenize inline-code spans into placeholder markers before running bold/italic regexes, then restore them at the end. Any markdown-to-anything converter that handles both inline code and inline formatting needs this pattern.

Per-Week Child Pages

Rather than overwriting one page each Sunday, I create a child page per week titled Week 2026-W21, Week 2026-W22, and so on, under a parent index page. Confluence's built-in Children Display macro on the parent auto-renders the list of weekly children, sorted newest-first.

Re-runs within the same week update-in-place. New weeks create new siblings. The parent page is just a clean intro plus a Children Display macro configured with sort=creation and reverse=true, so the most recent week is always at the top.

Email Delivery: SES Cross-Account

I initially used SNS email subscriptions but the per-recipient confirmation friction and the AWS-styled no-reply@sns.amazonaws.com sender felt cheap. Switched to SES.

SES Sandbox Is Per-Account-Per-Region

Even if you call SES with credentials that have IAM permission to send, you must be sending from an account that's out of sandbox to reach unverified recipients. My Logging account's SES was in sandbox. The mgmt account's was out.

Fix: have the Lambda assume a role into mgmt and call SES with the assumed credentials. A small SesSenderRole in mgmt trusts the Lambda's execution role in the Logging account and has ses:SendEmail against the verified domain identity. The Lambda calls sts.assume_role, gets temporary creds, and constructs a fresh SES client with those creds before sending. No SES identity policy is needed on the verified domain because we're now calling SES as the identity owner (mgmt) once we've assumed.

Self-Monitoring Alarms

Four CloudWatch alarms publish to a single ops SNS topic:

Alarm Catches
claude-summary-errors Lambda threw an exception (period 5 min)
claude-summary-missed-weekly-run No invocation in 7 days (cron broken)
prowler-scan-task-errors ECS log accumulated ERROR/Critical lines
security-state-file-stale Custom metric StateFileUpdated not seen in 7 days

The Lambda emits a freshness signal at the end of every successful run via cw.put_metric_data on a custom StateFileUpdated metric. The alarm watches for that metric to be missing, not for errors. If the cron silently stops firing, the missing-metric alarm catches it; nothing else would.

Alarm Tuning Lesson

Don't use a 24-hour period on a Lambda Errors alarm. I initially set period = 86400 thinking "we only run once a week", but a single error stays in the 24-hour aggregation window for up to 48 hours after recovery. The alarm flapped to ALARM the day after I'd already shipped a fix. Drop to period = 300 so transitions are tight.

Lessons Learned

  • Run a sandbox single-account scan first. Validates Prowler v5 flags, the cross-account assume-role chain, and the S3 upload path without burning hours on a 16-account run.
  • Don't trust the AWS CLI is in your container. The prowlercloud/prowler image has Python and boto3, no AWS CLI. Use --output-bucket-no-assume for upload and a one-line python -c with boto3 for any other AWS work.
  • Lambda memory cliff is real. OCSF JSON for ~15 accounts of a real org parses to ~1.5 GB in Python. Bumped from 512 MB to 3008 MB.
  • Region scope is your knob. The pipeline's serial design grows as regions x accounts x checks. Cut regions aggressively.
  • CloudWatch alarm period matters. Long aggregation windows cause stale ALARM states that fire after recovery. Default to short periods.
  • SES sandbox is a separate dimension from IAM. You can have full ses:SendEmail permission and still get rejected because the account you're calling from is in sandbox. Cross-account assume-role into a non-sandboxed account is the cleanest workaround.
  • Tighten your prompt to the markdown subset your converter actually supports. Otherwise the rendered Confluence page leaks **, |, and 1. characters everywhere.

Cost

Item Approx (monthly)
Fargate (1 run x ~50 min x 4 weeks) ~$0.40
S3 (findings + lifecycle to Glacier) ~$1
Lambda invocations + memory-time ~$0.10
CloudWatch Logs + 4 alarms + dashboard ~$1.50
Secrets Manager (2 secrets) ~$1
SES (Sundays only, single email) <$0.01
Anthropic API (one summary per week) $5-20 depending on findings volume
Total ~$10-25/month

A commercial CSPM at this org size starts at $20-50K/year. This pipeline is roughly 50x cheaper for the narrative posture review, with the obvious tradeoff that I don't get a polished UI, runtime threat detection, or vendor support. I pair this with native Security Hub for the continuous-detection layer.

Want to Build Something Similar?

This post covers the architecture and the gotchas that took me the longest to figure out, but it deliberately doesn't include the full Terraform, the complete Lambda handler, the week-over-week diff logic, the Secrets Manager wiring, the StackSet deployment ordering, the Confluence pagination/version-bump details, or a few other moving parts that would make this a copy-paste tutorial.

If you're standing up something like this in your own org and you'd like a hand on the parts I didn't write up, or you want to talk through the design choices for your specific environment, reach out via the contact form. I'm happy to talk through what I've learned, share more of the implementation, or just compare notes on multi-account security tooling.

What's Next

  • Multi-account fan-out to drop scan time from ~50 min serial to ~10 min parallel.
  • Security Hub integration - pull HIGH/CRITICAL Sec Hub findings into the Lambda's prompt for cross-source correlation.
  • Jira ticket creation for HIGH/CRITICAL findings once baseline posture is clean.
  • Athena queries over the S3-partitioned findings for "did this control fail any time in the last 90 days?" lookbacks.

The current pipeline runs every Sunday at 02:00 UTC and produces a fresh Week YYYY-Wnn Confluence child page plus an email to a small leadership group. The whole stack is ~600 lines of Terraform plus ~500 lines of Python.

Common gotchas to budget for if you build something similar: Prowler v5 flag changes, Lambda OCSF memory pressure, SES sandbox per-account, Confluence storage XHTML quirks, and CloudWatch alarm aggregation windows.

Related Articles