AWS Cloud – Searching and Retrieving Logs with Bash Scripting

When troubleshooting issues in a cloud environment, it’s crucial to have effective tools and scripts to search and retrieve logs efficiently. In this blog post, we’ll explore a Bash script that utilizes the AWS Command Line Interface (CLI) to search for specific error codes in CloudWatch logs and retrieve relevant log content. Let’s dive into the code and understand how it works.

#!/bin/bash

# Check if the error code is provided
if [[ -z "${ErrorCode}" ]]; then
  echo "ERROR: Error Code is mandatory to run this job"
  exit 1
fi

# Define the error code and log group name
error_code="${ErrorCode}"
log_group_name="<LOG_GROUP_NAME>"

# Define the task identifier (optional)
task_identifier="<TASK_IDENTIFIER>"

# Search for the message in CloudWatch
if [[ -z "${TaskId}" ]]; then
  echo "Searching error code [$error_code] without task id; it may take longer to find the matching records. Please provide a task id in the future."
  results=$(aws logs filter-log-events --log-group-name "$log_group_name" --filter-pattern "$error_code")
else
  echo "Searching error code [$error_code] using the task id [$task_identifier]"
  results=$(aws logs filter-log-events --log-group-name "$log_group_name" --log-stream-names "$task_identifier" --filter-pattern "$error_code")
fi

length=$(echo "$results" | jq '.events | length')

if [[ $length -le 0 ]]; then
  echo "$results"
  # Print the error message
  echo "ERROR: Search returned no results for the error code [$error_code]"
  exit 1
fi

# Get the timestamp and log stream from the search results
timestamp=$(echo "$results" | jq -r '.events[0].timestamp')
log_stream=$(echo "$results" | jq -r '.events[0].logStreamName')

# Fetch the log content from the stream 30 seconds before and after the timestamp
before_timestamp=$((timestamp - 30000))
after_timestamp=$((timestamp + 30000))

log_content=$(aws logs get-log-events --log-group-name "$log_group_name" --log-stream-name "$log_stream" --start-time "$before_timestamp" --end-time "$after_timestamp" --output text)

# Print the log content
echo "$log_content"

echo "Found the error code [$error_code] in the task [$log_stream]"

The Bash script is designed to accomplish the following tasks:

  1. Error Code Validation: The script checks if the ErrorCode environment variable is empty and exits with an error message if it is.
  2. Configuration: It defines variables for the error code (error_code) and the CloudWatch log group to search in (log_group_name).
  3. Searching for the Error Code: The script utilizes the AWS CLI’s aws logs filter-log-events command to search for the specified error_code in CloudWatch logs. It handles cases where the TaskId environment variable is provided or omitted.
  4. Handling Search Results: If search results are found, the script extracts the timestamp and log stream name from the first event using the jq command-line tool.
  5. Retrieving Log Content: Using the AWS CLI’s aws logs get-log-events command, the script fetches the log content from the specified log stream, capturing logs within a specific time range around the identified timestamp.
  6. Printing Log Content: Finally, the script outputs the retrieved log content and acknowledges the discovery of the error code within the specified task or log stream.

Leave a comment

Blog at WordPress.com.