When troubleshooting issues in a cloud environment, it’s crucial to have effective tools and scripts to search and retrieve logs efficiently. In this blog post, we’ll explore a Bash script that utilizes the AWS Command Line Interface (CLI) to search for specific error codes in CloudWatch logs and retrieve relevant log content. Let’s dive into the code and understand how it works.
#!/bin/bash
# Check if the error code is provided
if [[ -z "${ErrorCode}" ]]; then
echo "ERROR: Error Code is mandatory to run this job"
exit 1
fi
# Define the error code and log group name
error_code="${ErrorCode}"
log_group_name="<LOG_GROUP_NAME>"
# Define the task identifier (optional)
task_identifier="<TASK_IDENTIFIER>"
# Search for the message in CloudWatch
if [[ -z "${TaskId}" ]]; then
echo "Searching error code [$error_code] without task id; it may take longer to find the matching records. Please provide a task id in the future."
results=$(aws logs filter-log-events --log-group-name "$log_group_name" --filter-pattern "$error_code")
else
echo "Searching error code [$error_code] using the task id [$task_identifier]"
results=$(aws logs filter-log-events --log-group-name "$log_group_name" --log-stream-names "$task_identifier" --filter-pattern "$error_code")
fi
length=$(echo "$results" | jq '.events | length')
if [[ $length -le 0 ]]; then
echo "$results"
# Print the error message
echo "ERROR: Search returned no results for the error code [$error_code]"
exit 1
fi
# Get the timestamp and log stream from the search results
timestamp=$(echo "$results" | jq -r '.events[0].timestamp')
log_stream=$(echo "$results" | jq -r '.events[0].logStreamName')
# Fetch the log content from the stream 30 seconds before and after the timestamp
before_timestamp=$((timestamp - 30000))
after_timestamp=$((timestamp + 30000))
log_content=$(aws logs get-log-events --log-group-name "$log_group_name" --log-stream-name "$log_stream" --start-time "$before_timestamp" --end-time "$after_timestamp" --output text)
# Print the log content
echo "$log_content"
echo "Found the error code [$error_code] in the task [$log_stream]"
The Bash script is designed to accomplish the following tasks:
- Error Code Validation: The script checks if the
ErrorCodeenvironment variable is empty and exits with an error message if it is. - Configuration: It defines variables for the error code (
error_code) and the CloudWatch log group to search in (log_group_name). - Searching for the Error Code: The script utilizes the AWS CLI’s
aws logs filter-log-eventscommand to search for the specifiederror_codein CloudWatch logs. It handles cases where theTaskIdenvironment variable is provided or omitted. - Handling Search Results: If search results are found, the script extracts the timestamp and log stream name from the first event using the
jqcommand-line tool. - Retrieving Log Content: Using the AWS CLI’s
aws logs get-log-eventscommand, the script fetches the log content from the specified log stream, capturing logs within a specific time range around the identified timestamp. - Printing Log Content: Finally, the script outputs the retrieved log content and acknowledges the discovery of the error code within the specified task or log stream.

Leave a comment