Monitoring ETL Pipelines

Quick Links

Step Functions (dev): Sign in to the AWS Console Step Functions dashboard in the dev AWS account.
- Use the search bar to find state machines named data-etl-flow-{source-name}-stepFn.
- Click on your pipeline to view recent executions, statuses, and details.
CloudWatch Logs: From your Step Functions execution detail page, scroll down to find the "Log output" or "History" section. Click on the linked log group or log stream to open the related CloudWatch Logs for that specific pipeline execution. This will take you directly to the logs for troubleshooting and deeper inspection.
Pulumi Stack: https://app.pulumi.com/cartesianio/data-lake-infra/dev

Monitoring Locations

Step Functions (Pipeline Execution)

State Machine: data-etl-flow-{source-name}-stepFn

Check:

Execution status (Running/Succeeded/Failed)
Execution history with timestamps
Input/output payloads
Error details for failed executions

CloudWatch Logs

Log Groups:

/aws/ecs/data-collector-{source-name} - ECS task logs
/aws/emr-serverless/... - EMR job logs
data-etl-flow-{source-name}-logs - Step Function execution logs

Common Log Streams:

ecs/{task-id} - ECS container logs
EMR Serverless Spark Job Logs
- After launching a Spark job via Step Functions or directly in the EMR Studio console, navigate to the AWS EMR Serverless dashboard: EMR Serverless Console
- Select the relevant application and find your job run in the Job Runs list.
- Click into the Job Run. Under "Logs," you'll find direct links to CloudWatch log streams for Driver and Executors.
  - Logs are accessible from the "View logs" link within the EMR Serverless job run details.
  - The Driver log contains Spark driver output, including job orchestration details and errors.
  - Executor logs are also available for deeper debugging.
Spark UI (for EMR Serverless Jobs)
- Each EMR Serverless Spark job exposes a Spark History Server UI for visual inspection of stages, jobs, SQL, and resource usage.
- In your EMR Serverless Job Run details page (as above), look for the "Monitoring" or "Spark UI" link/button. Click this to open the Spark UI in a new tab.
  - The Spark UI link remains active for a limited time (typically several hours after job completion).
  - If the link is unavailable, you may need to re-run or troubleshoot job permissions/networking.
- Within the Spark UI, inspect Executors, Stages, and SQL tabs to diagnose performance issues, stage failures, or application bottlenecks.

Check:

lastStatus: RUNNING or STOPPED
stoppedReason: Error message

Manually Triggering a Flow via Lambda (Simulating EventBridge Trigger)

To manually simulate the pipeline trigger (as EventBridge would), you can use the built-in test functionality of the trigger Lambda.

Steps to Trigger Manually

Locate the Lambda Function
- In the AWS Console, navigate to Lambda.
- Search for the function named:
  data-etl-flow-{source-name}-trigger
Use the Predefined Test Event
- Select the Lambda function to open its details page.
- Go to the "Test" tab.
- There should already be a test event configured that mirrors the expected EventBridge payload.
- If not, create a new test event based on the input schema for EventBridge triggers. You can reference a recent EventBridge sample event from CloudWatch Logs if needed.
- Click on the "Test" button to trigger the pipeline. You can observe its execution in the Step Functions view, and monitor each step in the relevant AWS service (e.g., ECS, EMR Serverless) as it progresses.

Notes

The recommended way is to re-use the test event that is (or should be) pre-configured for the Lambda, so that the simulation exactly matches the automated trigger.
No need to manually craft a payload unless customizing for edge cases or debugging with special inputs.

This approach is ideal for quickly verifying that the end-to-end pipeline reacts correctly to event triggers in a controlled and reproducible way.

EMR Serverless Job Status

Check via Step Functions:

RunEMRBronze / RunEMRSilver steps
Job status in execution output
CloudWatch Logs for detailed errors

Common Statuses:

SUBMITTED → RUNNING → SUCCESS / FAILED

Processing Stages

Bronze → Silver → Gold

Bronze: Raw data ingestion
- Check S3 bucket: bronze-dl-{id}
- Table: bronze.{table_name}
Silver: Processed data
- Check S3 bucket: silver-dl-{id}
- Table: silver.{table_name}
Gold: Aggregated data
- Check S3 bucket: gold-dl-{id}
- Table: gold.{table_name}

Common Issues & Solutions

Failed ECS Tasks

Symptoms: Step Function execution stuck at RunECS

Check:

CloudWatch Logs for container errors
Task definition: aws ecs describe-task-definition --task-definition data-collector-{name}
Network/security group issues

EMR Job Failures

Symptoms: Step Function execution fails at RunEMRBronze or RunEMRSilver

Check:

EMR Serverless application status
S3 source data availability

Data Flow Issues

Symptoms: Bronze succeeds but Silver/Gold fails

Check:

S3 bucket contents: aws s3 ls s3://{bucket}/{path}/
Athena table queries: SELECT COUNT(*) FROM bronze.{table}
Date path state files: s3://{bucket}/state/last-run-bronze.json

References

Pipeline code: packages/data-lake/data-lake-infra/src/running-flow/dataEtlFlow.ts
Bronze infra: packages/data-lake/data-lake-infra/src/bronze/bronzeInfra.ts
Silver infra: packages/data-lake/data-lake-infra/src/silver/silverInfra.ts
Pulumi outputs: pulumi stack output --stack dev

Quick Links​

Monitoring Locations​

Step Functions (Pipeline Execution)​

CloudWatch Logs​

Manually Triggering a Flow via Lambda (Simulating EventBridge Trigger)​

Steps to Trigger Manually​

Notes​

EMR Serverless Job Status​

Processing Stages​

Common Issues & Solutions​

Failed ECS Tasks​

EMR Job Failures​

Data Flow Issues​

References​

Quick Links

Monitoring Locations

Step Functions (Pipeline Execution)

CloudWatch Logs

Manually Triggering a Flow via Lambda (Simulating EventBridge Trigger)

Steps to Trigger Manually

Notes

EMR Serverless Job Status

Processing Stages

Common Issues & Solutions

Failed ECS Tasks

EMR Job Failures

Data Flow Issues

References