When troubleshooting issues in a cloud environment, it’s crucial to have effective tools and scripts to search and retrieve logs efficiently. In this blog post, we’ll explore a Bash script that utilizes the AWS Command Line Interface (CLI) to search for specific error codes in CloudWatch logs and retrieve relevant log content. Let’s dive into the code and understand how it works.
#!/bin/bash
# Check if the error code is provided
if [[ -z "${ErrorCode}" ]]; then
echo "ERROR: Error Code is mandatory to run this job"
exit 1
fi
# Define the error code and log group name
error_code="${ErrorCode}"
log_group_name="<LOG_GROUP_NAME>"
# Define the task identifier (optional)
task_identifier="<TASK_IDENTIFIER>"
# Search for the message in CloudWatch
if [[ -z "${TaskId}" ]]; then
echo "Searching error code [$error_code] without task id; it may take longer to find the matching records. Please provide a task id in the future."
results=$(aws logs filter-log-events --log-group-name "$log_group_name" --filter-pattern "$error_code")
else
echo "Searching error code [$error_code] using the task id [$task_identifier]"
results=$(aws logs filter-log-events --log-group-name "$log_group_name" --log-stream-names "$task_identifier" --filter-pattern "$error_code")
fi
length=$(echo "$results" | jq '.events | length')
if [[ $length -le 0 ]]; then
echo "$results"
# Print the error message
echo "ERROR: Search returned no results for the error code [$error_code]"
exit 1
fi
# Get the timestamp and log stream from the search results
timestamp=$(echo "$results" | jq -r '.events[0].timestamp')
log_stream=$(echo "$results" | jq -r '.events[0].logStreamName')
# Fetch the log content from the stream 30 seconds before and after the timestamp
before_timestamp=$((timestamp - 30000))
after_timestamp=$((timestamp + 30000))
log_content=$(aws logs get-log-events --log-group-name "$log_group_name" --log-stream-name "$log_stream" --start-time "$before_timestamp" --end-time "$after_timestamp" --output text)
# Print the log content
echo "$log_content"
echo "Found the error code [$error_code] in the task [$log_stream]"
The Bash script is designed to accomplish the following tasks:
- Error Code Validation: The script checks if the
ErrorCode
environment variable is empty and exits with an error message if it is. - Configuration: It defines variables for the error code (
error_code
) and the CloudWatch log group to search in (log_group_name
). - Searching for the Error Code: The script utilizes the AWS CLI’s
aws logs filter-log-events
command to search for the specifiederror_code
in CloudWatch logs. It handles cases where theTaskId
environment variable is provided or omitted. - Handling Search Results: If search results are found, the script extracts the timestamp and log stream name from the first event using the
jq
command-line tool. - Retrieving Log Content: Using the AWS CLI’s
aws logs get-log-events
command, the script fetches the log content from the specified log stream, capturing logs within a specific time range around the identified timestamp. - Printing Log Content: Finally, the script outputs the retrieved log content and acknowledges the discovery of the error code within the specified task or log stream.
Leave a comment