notes

What is Shell Scripting

Shell scripting is basically writing a series of commands in a text file that your computer’s terminal can run automatically. Think of it like creating a recipe where, instead of cooking steps, you’re telling your computer what tasks to do in order. It’s super handy for automating repetitive stuff like backing up files, managing system tasks, or running multiple programs with just one command. Most people use Bash shell on Linux/Mac, but there’s also PowerShell on Windows - either way, it saves tons of time once you get the hang of it.

How to prepare for a Shell Scripting Interview

Get your basics rock solid - Make sure you’re comfortable with fundamental commands like ls, grep, sed, awk, and pipes. Most interview questions for shell scripting start with these building blocks, so practice combining them in different ways.

Practice writing actual scripts - Don’t just memorize syntax. Write scripts for real tasks like file backups, log parsing, or system monitoring. Interviewers love seeing practical problem-solving skills.

Master the common shell scripting interview questions - You’ll definitely get asked about variables, loops, conditionals, and functions. Practice explaining the difference between $@ and $*, when to use double vs single quotes, and how exit codes work.

Debug like crazy - Learn to use set -x for debugging and understand error handling with trap commands. Being able to troubleshoot broken scripts on the spot is a huge plus.

Know your shell differences - Understand what makes bash different from sh, ksh, or zsh. Some companies are picky about POSIX compliance.

Study real-world scenarios - Look up common automation tasks like monitoring disk space, rotating logs, or deploying applications. These make great discussion points during interviews.

Practice explaining your code - The best script in the world won’t help if you can’t walk through your logic clearly. Practice talking through your problem-solving approach out loud.

Shell Scripting Interview Questions and Answers For Beginners

Starting your journey with shell scripting interviews? Don’t worry - most bash scripting interview questions for beginners focus on practical basics rather than complex theory. We’ll walk through the most common bash shell interview questions that entry-level positions typically cover, helping you understand not just what to answer, but why things work the way they do. These questions come up constantly because they test whether you can actually write useful scripts, not just memorize commands.

Q1: What exactly is a shell script?

Think of it as a text file filled with commands you’d normally type one by one in the terminal. Instead of typing them manually every time, you save them in a file and run them all at once. It’s like creating a macro for your terminal - super helpful when you need to do the same tasks repeatedly.

Q2: How do I create my first shell script?

Open any text editor (nano, vim, or even notepad), type your commands, and save it with a .sh extension. Here’s the simplest example:

bash

#!/bin/bash

echo “My first script!”

Save it as myscript.sh, make it executable with chmod +x myscript.sh, then run with ./myscript.sh.

Q3: Why do some variables have $ and others don’t?

You use $ when you want to get the value from a variable, but not when you’re setting it. Setting: name=“John” (no $). Using: echo “Hello $name” (with $). It’s like the difference between writing on a nametag versus reading what’s already written there.

Q4: What are those -f, -d, -e things I see in if statements?

These are test operators that check file conditions. -f checks if something is a regular file, -d checks for directories, -e checks if anything exists at that path. There are tons more: -r for readable, -w for writable, -x for executable. They’re your script’s way of looking before it leaps.

Q5: How do I do basic math in bash?

Bash is quirky with math - you need special syntax. Use $((expression)) for arithmetic: result=$((5 + 3)). For decimals, bash can’t handle them natively - you’d need bc: echo “5.5 + 2.3” | bc. Remember, spaces don’t matter inside $(( )), which is unusual for bash.

Q6: What’s this 2>&1 thing I keep seeing?

This redirects error messages to wherever regular output is going. The 2 represents stderr (error messages), 1 represents stdout (normal output), and & tells bash “I mean the file descriptor, not a file named 1”. So command > output.txt 2>&1 sends both regular output and errors to output.txt.

Q7: How do I check if a command succeeded?

Every command returns an exit code - 0 for success, anything else for failure. Check it right after the command:

bash

cp file1 file2

if [ $? -eq 0 ]; then

echo “Copy worked!”

else

echo “Copy failed!”

Q8: What’s the deal with spaces in bash?

Bash is super picky about spaces, especially in conditions. if [ $a = $b ] needs spaces around the brackets and operators. But a=5 can’t have spaces around the =. It’s annoying at first, but you’ll develop muscle memory for it pretty quickly.

Q9: How do I loop through a list of items?

The for loop is your friend here. For a simple list:

bash

for fruit in apple banana orange; do

echo “I like $fruit”

done

For files: for file in *.txt; do something; done. The loop automatically splits on spaces unless you mess with IFS.

Q10: Can I use variables from outside my script?

Yes! These are environment variables. Your script can read system variables like $HOME, $USER, $PATH. To pass your own, either export them first (export MYVAR=“value”) or set them inline: MYVAR=“value” ./myscript.sh. Just remember they’re readable, not writable - changes inside the script won’t affect the outside.

Q11: What happens if my script crashes midway?

By default, bash keeps going even if commands fail. To make it stop on errors, add set -e near the top. Want to see what’s happening? Add set -x for debug mode. Combine them as set -ex. Just remember these make your script stricter - sometimes you want to handle errors yourself instead.

Q12: How do I get input from users?

The read command is what you need. Basic usage: read username waits for input and stores it. Fancy it up with read -p “What’s your name? ” username for a prompt. Want a password? Use read -s to hide what they type. Set a timeout with read -t 10 to wait only 10 seconds.

Shell Scripting Coding Interview Questions and Answers For Beginners

Ready to tackle actual shell scripting coding interview questions? These hands-on problems test whether you can write working scripts, not just talk about concepts. We’ll cover the classic coding challenges that beginners face in interviews - the ones where you need to write real bash code to solve practical problems. Each solution includes the full script with explanations, so you can understand the logic and adapt it for similar questions.

Q1: Write a script to check if a number is even or odd

This tests your understanding of arithmetic operations and conditionals:

bash

#!/bin/bash

echo “Enter a number: ”

read num

Check if the input is actually a number

if ! [[ “$num” =~ ^[0-9]+$ ]]; then

echo “Please enter a valid number”

exit 1

Use modulo operator to check even/odd

if [ $((num % 2)) -eq 0 ]; then

echo “$num is even”

else

echo “$num is odd”

The key here is using the modulo operator (%) inside $(( )). If a number divided by 2 has no remainder, it’s even. The regex check prevents the script from breaking on non-numeric input.

Q2: Create a script that counts files in a directory by extension

This problem tests file handling and associative arrays:

bash

#!/bin/bash

Check if directory argument provided

if [ $# -eq 0 ]; then

echo “Usage: $0 <directory>”

exit 1

directory=“$1”

Check if directory exists

if [ ! -d “$directory” ]; then

echo “Directory $directory doesn’t exist”

exit 1

Count files by extension

echo “File count by extension in $directory:”

for file in “$directory”/*; do

Skip if no files found

[ -e “$file” ] || continue

Skip directories

[ -f “$file” ] || continue

Extract extension

extension=”${file##*.}”

Handle files without extension

if [ “$extension” = “$file” ]; then

extension=“no_extension”

Count using associative array would be ideal, but

for beginners, let’s use a simpler approach

echo “$extension”

done | sort | uniq -c | sort -rn

This solution uses parameter expansion (${file##*.}) to extract extensions and pipes the output through sort and uniq to count occurrences. It’s a common pattern in shell scripting - using Unix tools together.

Q3: Write a script to backup files modified in the last 7 days

This tests your knowledge of the find command and date handling:

bash

#!/bin/bash

Set source and backup directories

source_dir=”${1:-.}” # Use current directory if not specified

backup_dir=“backup_$(date +%Y%m%d)“

Create backup directory

mkdir -p “$backup_dir”

echo “Starting backup of files modified in last 7 days…”

Find and copy files modified in last 7 days

count=0

find “$source_dir” -type f -mtime -7 | while read -r file; do

Skip if file is in backup directory

[[ “$file” == “$backup_dir” ]] && continue

Create subdirectory structure in backup

rel_path=”${file#$source_dir/}”

backup_path=“$backup_dir/$rel_path”

backup_subdir=”$(dirname “$backup_path”)”

mkdir -p “$backup_subdir”

Copy file preserving timestamps

if cp -p “$file” “$backup_path” 2>/dev/null; then

((count++))

echo “Backed up: $rel_path”

done

echo “Backup complete! $count files copied to $backup_dir”

The script uses find with -mtime -7 to locate recent files. The while read loop processes filenames with spaces correctly. We preserve the directory structure in the backup and use cp -p to keep timestamps.

Q4: Create a script that monitors disk usage and alerts when it exceeds 80%

This demonstrates system monitoring and parsing command output:

bash

#!/bin/bash

Set threshold

threshold=80

echo “Checking disk usage…”

echo ”========================“

Get disk usage and process each line

df -h | grep -E ’^/dev/’ | while read -r line; do

Parse the df output

device=$(echo “$line” | awk ‘{print $1}’)

usage=$(echo “$line” | awk ‘{print $5}’ | sed ‘s/%//’)

mount=$(echo “$line” | awk ‘{print $6}‘)

Check if usage exceeds threshold

if [ “$usage” -gt “$threshold” ]; then

echo “WARNING: $mount ($device) is at $usage% capacity!”

You could add email notification here

echo “Disk $mount is at $usage%” | mail -s “Disk Alert” admin@example.com

else

echo “OK: $mount ($device) is at $usage% capacity”

done

Also check if any partition is critically full (>90%)

critical=$(df -h | grep -E ’^/dev/’ | awk ‘{print $5}’ | sed ‘s/%//’ | sort -n | tail -1)

if [ “$critical” -gt 90 ]; then

echo ""

echo “CRITICAL: At least one partition is over 90% full!”

exit 1

This script parses df output using awk to extract specific fields. The percentage sign is stripped with sed so we can do numeric comparison. Real-world scripts might send emails or write to log files.

Q5: Write a script that validates and processes a CSV file

This tests file parsing and error handling:

bash

#!/bin/bash

Check if file provided

if [ $# -eq 0 ]; then

echo “Usage: $0 <csv_file>”

exit 1

csv_file=“$1”

Check if file exists and is readable

if [ ! -r “$csv_file” ]; then

echo “Error: Cannot read file $csv_file”

exit 1

echo “Processing CSV file: $csv_file”

echo ”================================“

Process header

header=$(head -1 “$csv_file”)

num_fields=$(echo “$header” | awk -F’,’ ‘{print NF}’)

echo “Header fields ($num_fields): $header”

echo ""

Process data lines

line_num=1

error_count=0

valid_count=0

tail -n +2 “$csv_file” | while IFS=’,’ read -r field1 field2 field3 remainder; do

((line_num++))

Check if line has correct number of fields

current_fields=$(echo “$field1,$field2,$field3,$remainder” | awk -F’,’ ‘{print NF}’)

if [ “$current_fields” -ne “$num_fields” ]; then

echo “Error on line $line_num: Expected $num_fields fields, got $current_fields”

((error_count++))

continue

Basic validation - check if fields are not empty

if [ -z “$field1” ] || [ -z “$field2” ] || [ -z “$field3” ]; then

echo “Error on line $line_num: Empty required fields”

((error_count++))

continue

Process valid line (example: just echo it)

echo “Processing: $field1 | $field2 | $field3”

((valid_count++))

done

echo ""

echo “Summary: Processed $valid_count valid lines, found $error_count errors”

This script shows proper CSV handling using IFS (Internal Field Separator) and read. It validates the number of fields per line and checks for empty values. The tail -n +2 skips the header for processing.

Shell Scripting Interview Questions and Answers For Intermediate (2-4 years Exp).

Moving beyond the basics, intermediate shell scripting coding interview questions focus on real-world problem solving, performance optimization, and handling complex scenarios you’ve likely encountered in production. We’ll explore questions that test your experience with error handling, process management, and writing maintainable scripts that other developers can understand and modify. These questions reflect the challenges you face when scripts move from personal tools to team resources that need to be reliable and efficient.

Q1: How do you handle errors gracefully in production scripts?

At this level, you need multiple strategies working together. I use trap to catch errors and cleanup, set -euo pipefail to stop on failures, and custom error functions:

#!/bin/bash

set -euo pipefail

Global error handler

error_exit() {

echo “Error on line $1: $2” >&2

cleanup

exit 1

}

cleanup() {

Remove temp files, unlock resources, etc

rm -f /tmp/mylockfile

echo “Cleanup completed”

}

Set trap for errors and exit

trap ‘error_exit $LINENO “Command failed”’ ERR

trap cleanup EXIT

Your actual script logic here

The key is combining these techniques - set -e stops on errors, -u catches undefined variables, -o pipefail catches errors in pipes, and trap ensures cleanup happens no matter what.

Q2: Explain process substitution and when you’d use it

Process substitution <(command) creates a temporary file descriptor that acts like a file. It’s brilliant for comparing outputs or feeding multiple processes:

Compare two command outputs

diff <(ls dir1) <(ls dir2)

Feed sorted data to a command expecting a file

join <(sort file1) <(sort file2)

Multiple inputs to a single command

paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f3 /etc/passwd)

I use it when I need to avoid temporary files or when working with commands that only accept file arguments but I want to give them dynamic data.

Q3: How do you implement proper logging in shell scripts?

Production scripts need structured logging with timestamps, levels, and rotation. Here’s my go-to pattern:

LOG_FILE=“/var/log/myscript.log”

LOG_LEVEL=${LOG_LEVEL:-“INFO”} # Can override via environment

log() {

local level=$1

shift

local message=”$@”

local timestamp=$(date ’+%Y-%m-%d %H:%M:%S’)

Check log level

case $LOG_LEVEL in

ERROR) [[ $level =~ ^(ERROR)$ ]] || return ;;

WARN) [[ $level =~ ^(ERROR|WARN)$ ]] || return ;;

INFO) [[ $level =~ ^(ERROR|WARN|INFO)$ ]] || return ;;

DEBUG) ;; # Log everything

esac

echo ”[$timestamp] [$level] $message” | tee -a “$LOG_FILE”

}

Usage

log INFO “Script started”

log ERROR “Connection failed”

log DEBUG “Variable X = $x”

Don’t forget log rotation - either use logrotate or implement size checking in your script.

Q4: How do you handle concurrent script execution?

Preventing race conditions is crucial. I typically use flock for reliable locking:

LOCK_FILE=“/var/run/myscript.lock”

exec 200>“$LOCK_FILE”

if ! flock -n 200; then

echo “Another instance is running. Exiting.”

exit 1

Alternative: wait for lock with timeout

if ! flock -w 10 200; then

echo “Could not acquire lock after 10 seconds”

exit 1

Script continues here with exclusive lock

For more complex scenarios, I might use mkdir as an atomic operation or implement a PID file system that checks if the process is actually still running.

Q5: What’s your approach to parsing complex command line arguments?

For production scripts, I use getopts for short options and manual parsing for long options:

usage() {

cat << EOF

Usage: $0 [-h] [-v] [-f FILE] [—debug] [—config CONFIG]

Options:

-h, —help Show this help

-v, —verbose Enable verbose mode

-f FILE Input file

—debug Enable debug mode

—config FILE Configuration file

EOF

}

Parse arguments

VERBOSE=0

DEBUG=0

while [[ $# -gt 0 ]]; do

case $1 in

-h|—help)

usage

exit 0

;;

-v|—verbose)

VERBOSE=1

shift

;;

-f)

INPUT_FILE=“$2”

shift 2

;;

—debug)

DEBUG=1

set -x # Enable bash debugging

shift

;;

—config)

CONFIG_FILE=“$2”

shift 2

;;

—)

shift

break

;;

-*)

echo “Unknown option: $1” >&2

usage

exit 1

;;

break

;;

esac

done

Remaining arguments are in $@

Q6: How do you optimize scripts that process large files?

Never load entire files into memory. Instead, stream process them:

Bad: loads entire file

for line in $(cat hugefile.txt); do

process “$line”

done

Good: streams line by line

while IFS= read -r line; do

process “$line”

done < hugefile.txt

Better: parallel processing for CPU-bound tasks

cat hugefile.txt | parallel -j 4 process {}

For simple transformations, use tools designed for it

awk ‘{sum += $3} END {print sum}’ hugefile.txt # Sum third column

sed -i ‘s/old/new/g’ hugefile.txt # In-place replacement

Also consider splitting large files and processing chunks in parallel, or using tools like split, sort -S for memory limits, and join instead of loading everything into associative arrays.

Q7: Explain your strategy for making scripts portable across different systems

Portability is tricky. I start with POSIX compliance when possible, but pragmatically handle differences:

#!/usr/bin/env bash # More portable than /bin/bash

Detect OS

if [[ “$OSTYPE” == “linux-gnu”* ]]; then

OS=“linux”

elif [[ “$OSTYPE” == “darwin”* ]]; then

OS=“macos”

elif [[ “$OSTYPE” == “freebsd”* ]]; then

OS=“freebsd”

Handle command differences

if command -v gsed &> /dev/null; then

SED=gsed # GNU sed on Mac

else

SED=sed

Check for required commands

for cmd in awk grep “$SED”; do

if ! command -v “$cmd” &> /dev/null; then

echo “Required command ‘$cmd’ not found” >&2

exit 1

done

Use portable options

find . -type f -name “*.txt” -print0 | xargs -0 grep “pattern” # Works everywhere

Q8: How do you implement timeout functionality for long-running commands?

The timeout command is great when available, but here’s a portable solution:

Using timeout command (if available)

if command -v timeout &> /dev/null; then

timeout 30s long_running_command

else

Manual implementation

long_running_command &

pid=$!

count=0

while kill -0 $pid 2>/dev/null && [ $count -lt 30 ]; do

sleep 1

((count++))

done

if kill -0 $pid 2>/dev/null; then

echo “Command timed out after 30 seconds”

kill -TERM $pid

sleep 2

kill -0 $pid 2>/dev/null && kill -KILL $pid

Alternative: using read with timeout for user input

if read -t 10 -p “Enter value (10 sec timeout): ” value; then

echo “You entered: $value”

else

echo “Timeout or cancelled”

Q9: What’s your approach to debugging complex shell scripts?

Beyond set -x, I use structured debugging:

Debug function with levels

DEBUG_LEVEL=${DEBUG_LEVEL:-0}

debug() {

local level=$1

shift

if [ “$DEBUG_LEVEL” -ge “$level” ]; then

echo “[DEBUG$level] $*” >&2

}

Trace function execution

trace() {

local func=$1

shift

debug 1 “Entering $func with args: $*”

local start=$(date +%s.%N)

“$func” ”$@”

local ret=$?

local end=$(date +%s.%N)

local duration=$(echo “$end - $start” | bc)

debug 1 “Exiting $func (duration: ${duration}s, return: $ret)”

return $ret

}

PS4 for better trace output

export PS4=’+(${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }‘

Conditional debugging

[ “$DEBUG” = “1” ] && set -x

I also use ShellCheck religiously, and for really tough bugs, I’ll use bashdb or add extensive logging.

Q10: How do you manage configuration in shell scripts?

I layer configurations from multiple sources with clear precedence:

Default configuration

declare -A CONFIG=(

[db_host]=“localhost”

[db_port]=“5432”

[log_level]=“INFO”

)

Load system config

[ -f /etc/myapp/config ] && source /etc/myapp/config

Load user config

[ -f ~/.myapp/config ] && source ~/.myapp/config

Load project config

[ -f ./config ] && source ./config

Environment variables override everything

[ -n “$MYAPP_DB_HOST” ] && CONFIG[db_host]=$MYAPP_DB_HOST

[ -n “$MYAPP_DB_PORT” ] && CONFIG[db_port]=$MYAPP_DB_PORT

Validate required settings

for key in db_host db_port; do

if [ -z ”${CONFIG[$key]}” ]; then

echo “Error: Missing required config: $key” >&2

exit 1

done

Export for child processes if needed

export MYAPP_CONFIG_DB_HOST=”${CONFIG[db_host]}”

This gives flexibility while maintaining predictable behavior - defaults, system-wide settings, user preferences, project overrides, and finally environment variables.

Shell Scripting Coding Interview Questions and Answers For Intermediate (2-4 years Exp)

At the intermediate level, shell scripting coding interview questions shift from basic syntax to solving real production challenges - the kind where your script needs to handle thousands of files, recover from failures, and play nice with other systems. We’ll tackle the practical problems that someone with a few years under their belt should handle confidently, focusing on efficiency, reliability, and maintainability. These are the scenarios where your experience shows through in how you structure your solution, not just whether it works.

Q1: Write a script that monitors a log file in real-time and sends alerts for specific error patterns

This tests your ability to handle continuous file monitoring and pattern matching:

#!/bin/bash

Configuration

LOG_FILE=”${1:-/var/log/application.log}”

ALERT_EMAIL=“admin@company.com”

ERROR_PATTERNS=(“ERROR” “CRITICAL” “FATAL” “OutOfMemory”)

ALERT_THRESHOLD=5 # Alert after 5 errors in 60 seconds

WINDOW_SIZE=60

Check if log file exists

if [ ! -f “$LOG_FILE” ]; then

echo “Error: Log file $LOG_FILE not found”

exit 1

Initialize error tracking

declare -A error_counts

declare -A error_timestamps

Function to send alert

send_alert() {

local pattern=$1

local count=$2

local message=“Alert: $count occurrences of ‘$pattern’ in last $WINDOW_SIZE seconds”

In real scenario, you’d send email or post to Slack

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] $message”

Example email command (commented out)

echo “$message” | mail -s “Log Alert: $pattern” $ALERT_EMAIL

Reset counter after alert

error_counts[$pattern]=0

error_timestamps[$pattern]=""

}

Function to clean old timestamps

clean_old_timestamps() {

local pattern=$1

local current_time=$(date +%s)

local new_timestamps=""

Keep only timestamps within window

for ts in ${error_timestamps[$pattern]}; do

if [ $((current_time - ts)) -lt $WINDOW_SIZE ]; then

new_timestamps=“$new_timestamps $ts”

done

error_timestamps[$pattern]=“$new_timestamps”

error_counts[$pattern]=$(echo “$new_timestamps” | wc -w)

}

echo “Monitoring $LOG_FILE for error patterns…”

echo “Patterns: ${ERROR_PATTERNS[*]}”

echo “Alert threshold: $ALERT_THRESHOLD errors in $WINDOW_SIZE seconds”

echo “Press Ctrl+C to stop”

Monitor log file

tail -F “$LOG_FILE” | while read -r line; do

Check each pattern

for pattern in ”${ERROR_PATTERNS[@]}”; do

if [[ “$line” =~ $pattern ]]; then

current_time=$(date +%s)

Add timestamp

error_timestamps[$pattern]=”${error_timestamps[$pattern]} $current_time”

Clean old timestamps and update count

clean_old_timestamps “$pattern”

Check if threshold exceeded

if [ ”${error_counts[$pattern]}” -ge “$ALERT_THRESHOLD” ]; then

send_alert “$pattern” ”${error_counts[$pattern]}”

Debug output

echo ”[$(date ’+%H:%M:%S’)] Found: $pattern (count: ${error_counts[$pattern]})”

done

The script uses tail -F (capital F) to follow log rotation, tracks errors within a time window, and only alerts when thresholds are exceeded. This prevents alert spam while catching real issues.

Q2: Create a script that performs parallel backup of multiple directories with progress tracking

This demonstrates process management and parallel execution:

#!/bin/bash

Source directories to backup

SOURCE_DIRS=(

“/home/user/documents”

“/home/user/projects”

“/var/www/html”

“/etc”

)

BACKUP_ROOT=“/backup/$(date +%Y%m%d_%H%M%S)”

MAX_PARALLEL=3

COMPRESSION=“gzip” # or “bzip2”, “xz”

Create backup directory

mkdir -p “$BACKUP_ROOT”

File to track progress

PROGRESS_FILE=“/tmp/backup_progress_$$”

rm -f “$PROGRESS_FILE”

touch “$PROGRESS_FILE”

Function to backup a single directory

backup_directory() {

local src_dir=$1

local dir_name=$(basename “$src_dir”)

local backup_file=“$BACKUP_ROOT/${dir_name}.tar.gz”

local pid=$$

local start_time=$(date +%s)

echo “$pid:$dir_name:0:STARTING” >> “$PROGRESS_FILE”

Calculate directory size for progress

local total_size=$(du -sb “$src_dir” 2>/dev/null | awk ‘{print $1}‘)

Create backup with progress monitoring

tar cf - “$src_dir” 2>/dev/null | \

pv -s “$total_size” -n 2>/tmp/pv_progress_$$ | \

gzip > “$backup_file” &

local tar_pid=$!

Monitor progress

while kill -0 $tar_pid 2>/dev/null; do

if [ -f /tmp/pv_progress_$$ ]; then

progress=$(tail -1 /tmp/pv_progress_$$ 2>/dev/null || echo “0”)

echo “$pid:$dir_name:$progress:RUNNING” >> “$PROGRESS_FILE”

sleep 1

done

Check if backup succeeded

wait $tar_pid

local exit_code=$?

local end_time=$(date +%s)

local duration=$((end_time - start_time))

if [ $exit_code -eq 0 ]; then

local final_size=$(stat -f%z “$backup_file” 2>/dev/null || stat -c%s “$backup_file”)

echo “$pid:$dir_name:100:COMPLETED:$duration:$final_size” >> “$PROGRESS_FILE”

else

echo “$pid:$dir_name:0:FAILED:$duration:0” >> “$PROGRESS_FILE”

rm -f /tmp/pv_progress_$$

}

Function to display progress

show_progress() {

while true; do

clear

echo “Backup Progress - $(date ’+%Y-%m-%d %H:%M:%S’)”

echo ”==================================================“

Read current progress

while IFS=’:’ read -r pid dir progress status duration size; do

case $status in

STARTING)

printf ”%-30s [%s]\n” “$dir” “Initializing…”

;;

RUNNING)

Create progress bar

bar_length=30

filled=$((progress * bar_length / 100))

bar=$(printf ’%*s’ “$filled” | tr ’ ’ ’=’)

empty=$((bar_length - filled))

bar=”${bar}$(printf ’%*s’ “$empty” | tr ’ ’ ’-’)”

printf ”%-30s [%s] %3d%%\n” “$dir” “$bar” “$progress”

;;

COMPLETED)

size_mb=$((size / 1024 / 1024))

printf ”%-30s [DONE] %dMB in %ds\n” “$dir” “$size_mb” “$duration”

;;

FAILED)

printf ”%-30s [FAILED]\n” “$dir”

;;

esac

done < “$PROGRESS_FILE”

Check if all backups are done

if ! grep -q “RUNNING|STARTING” “$PROGRESS_FILE” 2>/dev/null; then

echo ""

echo “All backups completed!”

break

sleep 1

done

}

Start progress display in background

show_progress &

progress_pid=$!

Run backups in parallel with job control

job_count=0

for dir in ”${SOURCE_DIRS[@]}”; do

Wait if we’ve hit the parallel limit

while [ $(jobs -r | wc -l) -ge $MAX_PARALLEL ]; do

sleep 0.5

done

Start backup job

backup_directory “$dir” &

((job_count++))

done

Wait for all backup jobs

wait

Stop progress display

kill $progress_pid 2>/dev/null

show_progress # Show final status

Cleanup

rm -f “$PROGRESS_FILE”

Generate summary report

echo ""

echo “Backup Summary”

echo ”==============”

echo “Location: $BACKUP_ROOT”

echo “Total archives: $(ls -1 “$BACKUP_ROOT”/*.tar.gz 2>/dev/null | wc -l)”

echo “Total size: $(du -sh “$BACKUP_ROOT” | awk ‘{print $1}’)“

List all backups with sizes

ls -lh “$BACKUP_ROOT”/*.tar.gz 2>/dev/null

This script showcases parallel job management, real-time progress tracking with pv, and proper cleanup. It handles multiple directories simultaneously while keeping the user informed.

Q3: Write a script that synchronizes configuration files across multiple servers

This tests your understanding of remote execution and error handling:

#!/bin/bash

Configuration

CONFIG_DIR=“/etc/myapp”

SERVERS=(“web01.example.com” “web02.example.com” “web03.example.com”)

SSH_USER=“deploy”

SSH_KEY=“$HOME/.ssh/deploy_key”

BACKUP_BEFORE_SYNC=true

DRY_RUN=false

Parse command line

while getopts “dnh” opt; do

case $opt in

d) DRY_RUN=true ;;

n) BACKUP_BEFORE_SYNC=false ;;

h) echo “Usage: $0 [-d] [-n] [-h]”

echo ” -d Dry run (show what would be done)”

echo ” -n No backup before sync”

exit 0 ;;

esac

done

Colors for output

RED=‘\033[0;31m’

GREEN=‘\033[0;32m’

YELLOW=‘\033[1;33m’

NC=‘\033[0m’

Logging function

log() {

local level=$1

shift

local msg=”$@”

local timestamp=$(date ’+%Y-%m-%d %H:%M:%S’)

case $level in

ERROR) echo -e ”${RED}[$timestamp] ERROR: $msg${NC}” >&2 ;;

SUCCESS) echo -e ”${GREEN}[$timestamp] SUCCESS: $msg${NC}” ;;

INFO) echo -e ”[$timestamp] INFO: $msg” ;;

WARN) echo -e ”${YELLOW}[$timestamp] WARN: $msg${NC}” ;;

esac

Also log to file

echo ”[$timestamp] $level: $msg” >> “/var/log/config_sync.log”

}

Function to check server connectivity

check_server() {

local server=$1

ssh -q -o ConnectTimeout=5 -o BatchMode=yes \

-i “$SSH_KEY” “$SSH_USER@$server” exit 2>/dev/null

}

Function to backup configs on remote server

backup_remote_configs() {

local server=$1

local backup_name=“config_backup_$(date +%Y%m%d_%H%M%S).tar.gz”

log INFO “Creating backup on $server”

ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

if [ -d ‘$CONFIG_DIR’ ]; then

sudo tar czf /tmp/$backup_name $CONFIG_DIR 2>/dev/null

sudo mv /tmp/$backup_name /var/backups/

echo ‘Backup created: /var/backups/$backup_name’

else

echo ‘Config directory not found, skipping backup’

”

}

Function to sync single file

sync_file() {

local server=$1

local file=$2

local rel_path=${file#$CONFIG_DIR/}

Calculate checksum

local local_sum=$(md5sum “$file” | awk ‘{print $1}’)

local remote_sum=$(ssh -i “$SSH_KEY” “$SSH_USER@$server” \

“sudo md5sum ‘$file’ 2>/dev/null | awk ‘{print $1}’”)

if [ “$local_sum” = “$remote_sum” ]; then

echo ” ✓ $rel_path (unchanged)”

return 0

if [ “$DRY_RUN” = true ]; then

echo ” → Would sync: $rel_path”

return 0

Copy file

if scp -i “$SSH_KEY” “$file” “$SSH_USER@$server:/tmp/$(basename “$file”)” >/dev/null 2>&1; then

Move to final location with sudo

if ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

sudo mkdir -p $(dirname “$file”)

sudo mv /tmp/$(basename “$file”) $file

sudo chown root:root $file

sudo chmod 644 $file

”; then

echo ” ✓ $rel_path (updated)”

return 0

echo ” ✗ $rel_path (failed)”

return 1

}

Main sync process

log INFO “Starting configuration sync”

[ “$DRY_RUN” = true ] && log WARN “Running in DRY RUN mode”

Check all servers first

log INFO “Checking server connectivity…”

available_servers=()

for server in ”${SERVERS[@]}”; do

if check_server “$server”; then

available_servers+=(“$server”)

echo ” ✓ $server”

else

log ERROR “Cannot connect to $server”

echo ” ✗ $server”

done

if [ ${#available_servers[@]} -eq 0 ]; then

log ERROR “No servers available for sync”

exit 1

Find all config files

config_files=$(find “$CONFIG_DIR” -type f -name “.conf” -o -name “.yml” -o -name “*.json”)

file_count=$(echo “$config_files” | wc -l)

log INFO “Found $file_count configuration files to sync”

Sync to each server

for server in ”${available_servers[@]}”; do

log INFO “Syncing to $server”

Backup if requested

if [ “$BACKUP_BEFORE_SYNC” = true ] && [ “$DRY_RUN” = false ]; then

backup_remote_configs “$server”

Sync each file

success_count=0

fail_count=0

while IFS= read -r file; do

if sync_file “$server” “$file”; then

((success_count++))

else

((fail_count++))

done <<< “$config_files”

Reload services if sync succeeded

if [ $fail_count -eq 0 ] && [ “$DRY_RUN” = false ]; then

log INFO “Reloading services on $server”

ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

sudo systemctl reload nginx 2>/dev/null || true

sudo systemctl reload myapp 2>/dev/null || true

”

Report for this server

if [ $fail_count -eq 0 ]; then

log SUCCESS “$server: $success_count files synced successfully”

else

log ERROR “$server: $fail_count files failed (${success_count} succeeded)”

done

log INFO “Configuration sync completed”

This script handles multiple servers, does checksums to avoid unnecessary transfers, creates backups, and includes proper error handling with dry-run support.

Q4: Create a script that analyzes system performance and generates an HTML report

This shows data collection, processing, and formatted output:

#!/bin/bash

Output file

REPORT_FILE=“system_report_$(hostname)$(date +%Y%m%d%H%M%S).html”

TEMP_DIR=“/tmp/sysreport_$$”

mkdir -p “$TEMP_DIR”

Thresholds for warnings

CPU_THRESHOLD=80

MEM_THRESHOLD=85

DISK_THRESHOLD=90

Function to get CSS color based on value

get_color() {

local value=$1

local threshold=$2

if [ $(echo “$value >= $threshold” | bc) -eq 1 ]; then

echo “#ff4444” # Red

elif [ $(echo “$value >= $threshold * 0.8” | bc) -eq 1 ]; then

echo “#ff9944” # Orange

else

echo “#44ff44” # Green

}

Collect system information

echo “Collecting system information…”

Basic info

HOSTNAME=$(hostname)

UPTIME=$(uptime -p)

KERNEL=$(uname -r)

CPU_MODEL=$(grep “model name” /proc/cpuinfo | head -1 | cut -d: -f2)

CPU_CORES=$(nproc)

TOTAL_MEM=$(free -h | awk ’/^Mem:/ {print $2}‘)

CPU usage (over 5 seconds)

echo “Analyzing CPU usage…”

sar -u 1 5 > “$TEMP_DIR/cpu_stats.txt” 2>/dev/null || {

Fallback if sar not available

top -b -n 2 -d 5 | grep “Cpu(s)” | tail -1 > “$TEMP_DIR/cpu_stats.txt”

}

CPU_USAGE=$(awk ‘/Average:/ {print 100 - $NF}’ “$TEMP_DIR/cpu_stats.txt” || \

awk ‘{print $2}’ “$TEMP_DIR/cpu_stats.txt” | tr -d ‘%id,’ || echo “0”)

Memory usage

MEM_STATS=$(free -m | awk ’/^Mem:/ {printf “%.1f”, ($3/$2) * 100}‘)

Disk usage

echo “Analyzing disk usage…”

DISK_DATA=""

while read -r line; do

device=$(echo “$line” | awk ‘{print $1}’)

mount=$(echo “$line” | awk ‘{print $6}’)

usage=$(echo “$line” | awk ‘{print $5}’ | tr -d ’%’)

size=$(echo “$line” | awk ‘{print $2}’)

used=$(echo “$line” | awk ‘{print $3}’)

avail=$(echo “$line” | awk ‘{print $4}’)

color=$(get_color “$usage” “$DISK_THRESHOLD”)

DISK_DATA=”${DISK_DATA}

<tr>

<td>$device</td>

<td>$mount</td>

<td>$avail</td>

<td style=‘color: $color; font-weight: bold;’>$usage%</td>

</tr>”

done < <(df -h | grep ’^/dev/’ | grep -v ‘/dev/loop’)

Top processes

echo “Analyzing processes…”

TOP_CPU=$(ps aux —sort=-%cpu | head -6 | tail -5)

TOP_MEM=$(ps aux —sort=-%mem | head -6 | tail -5)

Network statistics

NETSTAT=$(ss -s 2>/dev/null | grep -E “TCP:|UDP:” | head -4)

Recent system logs (errors/warnings)

RECENT_ERRORS=$(journalctl -p err -n 10 —no-pager 2>/dev/null || \

dmesg | grep -iE “error|fail|warn” | tail -10)

Generate HTML report

cat > “$REPORT_FILE” << ‘EOF’

<!DOCTYPE html>

<html>

<head>

<title>System Performance Report</title>

<style>

body {

font-family: Arial, sans-serif;

margin: 20px;

background-color: #f5f5f5;

}

.container {

max-width: 1200px;

margin: 0 auto;

background-color: white;

padding: 20px;

box-shadow: 0 0 10px rgba(0,0,0,0.1);

}

h1, h2 {

color: #333;

border-bottom: 2px solid #4CAF50;

padding-bottom: 10px;

}

.info-grid {

display: grid;

grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));

gap: 20px;

margin: 20px 0;

}

.info-box {

background-color: #f9f9f9;

padding: 15px;

border-radius: 5px;

border-left: 4px solid #4CAF50;

}

table {

width: 100%;

border-collapse: collapse;

margin: 20px 0;

}

th, td {

text-align: left;

padding: 12px;

border-bottom: 1px solid #ddd;

}

th {

background-color: #4CAF50;

color: white;

}

tr:hover {

background-color: #f5f5f5;

}

.metric {

font-size: 24px;

font-weight: bold;

margin: 10px 0;

}

.warning {

background-color: #fff3cd;

border-color: #ffeaa7;

color: #856404;

padding: 10px;

border-radius: 5px;

margin: 10px 0;

}

.timestamp {

text-align: right;

color: #666;

font-style: italic;

}

pre {

background-color: #f4f4f4;

padding: 10px;

border-radius: 5px;

overflow-x: auto;

}

</style>

</head>

<body>

EOF

Add content

cat >> “$REPORT_FILE” << EOF

<h1>System Performance Report - $HOSTNAME</h1>

<p class=“timestamp”>Generated on $(date ’+%Y-%m-%d %H:%M:%S’)</p>

<strong>Hostname:</strong> $HOSTNAME<br>

<strong>Uptime:</strong> $UPTIME<br>

<strong>Kernel:</strong> $KERNEL

</div>

<strong>CPU Model:</strong> $CPU_MODEL<br>

<strong>CPU Cores:</strong> $CPU_CORES<br>

<strong>Total Memory:</strong> $TOTAL_MEM

</div>

<strong>CPU Usage:</strong>

${CPU_USAGE}%

</div>

<strong>Memory Usage:</strong>

${MEM_STATS}%

</div>

EOF

Add warnings if thresholds exceeded

if [ $(echo “$CPU_USAGE >= $CPU_THRESHOLD” | bc) -eq 1 ]; then

echo ‘<div class=“warning”>⚠️ High CPU usage detected!</div>’ >> “$REPORT_FILE”

if [ $(echo “$MEM_STATS >= $MEM_THRESHOLD” | bc) -eq 1 ]; then

echo ‘<div class=“warning”>⚠️ High memory usage detected!</div>’ >> “$REPORT_FILE”

Add disk usage table

cat >> “$REPORT_FILE” << EOF

<h2>Disk Usage</h2>

<table>

<tr>

<th>Device</th>

<th>Mount Point</th>

<th>Available</th>

<th>Usage %</th>

</tr>

$DISK_DATA

</table>

<h2>Top CPU Consuming Processes</h2>

<pre>$(echo “$TOP_CPU” | awk ‘{printf ”%-8s %-8s %6s %s\n”, $1, $2, $9, $11}’)</pre>

<h2>Top Memory Consuming Processes</h2>

<pre>$(echo “$TOP_MEM” | awk ‘{printf ”%-8s %-8s %6s %s\n”, $1, $2, $10, $11}’)</pre>

<h2>Network Statistics</h2>

<pre>$NETSTAT</pre>

<h2>Recent System Errors/Warnings</h2>

<pre>$(echo “$RECENT_ERRORS” | head -20)</pre>

</div>

</body>

</html>

EOF

Cleanup

rm -rf “$TEMP_DIR”

echo “Report generated: $REPORT_FILE”

Optional: open in browser

if command -v xdg-open >/dev/null 2>&1; then

xdg-open “$REPORT_FILE”

elif command -v open >/dev/null 2>&1; then

open “$REPORT_FILE”

This creates a professional-looking HTML report with color-coded metrics, responsive design, and comprehensive system analysis.

Q5: Write a script that implements a job queue system with worker processes

This demonstrates advanced process control and inter-process communication:

#!/bin/bash

Configuration

QUEUE_DIR=“/var/spool/job_queue”

WORKERS=4

PID_FILE=“/var/run/job_queue.pid”

LOG_FILE=“/var/log/job_queue.log”

WORKER_TIMEOUT=300 # 5 minutes max per job

Create necessary directories

mkdir -p “$QUEUE_DIR”/{pending,processing,completed,failed}

Logging function

log() {

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] $*” | tee -a “$LOG_FILE”

}

Signal handlers

cleanup() {

log “Shutting down job queue system…”

Signal all workers to stop

touch “$QUEUE_DIR/.shutdown”

Wait for workers to finish current jobs

for pid in $(jobs -p); do

log “Waiting for worker $pid to finish…”

wait $pid 2>/dev/null

done

rm -f “$PID_FILE” “$QUEUE_DIR/.shutdown”

log “Shutdown complete”

exit 0

}

trap cleanup SIGINT SIGTERM

Function to add a job to queue

add_job() {

local job_type=$1

local job_data=$2

local priority=${3:-5} # Default priority 5 (1=highest, 9=lowest)

local job_id=$(date +%s%N)_$$

local job_file=“$QUEUE_DIR/pending/${priority}_${job_id}.job”

cat > “$job_file” << EOF

JOB_ID=$job_id

JOB_TYPE=$job_type

JOB_DATA=$job_data

SUBMIT_TIME=$(date +%s)

SUBMIT_USER=$USER

STATUS=pending

EOF

log “Job added: $job_id (type: $job_type, priority: $priority)”

echo “$job_id”

}

Worker function

worker_process() {

local worker_id=$1

log “Worker $worker_id started (PID: $$)”

while [ ! -f “$QUEUE_DIR/.shutdown” ]; do

Find next job (sorted by priority and age)

local job_file=$(ls -1 “$QUEUE_DIR/pending/“*.job 2>/dev/null | sort | head -1)

if [ -z “$job_file” ]; then

No jobs, wait a bit

sleep 2

continue

Try to claim the job (atomic operation)

local processing_file=“$QUEUE_DIR/processing/$(basename “$job_file”)”

if ! mv “$job_file” “$processing_file” 2>/dev/null; then

Another worker got it first

continue

Load job details

source “$processing_file”

log “Worker $worker_id processing job $JOB_ID (type: $JOB_TYPE)“

Update job status

echo “WORKER=$worker_id” >> “$processing_file”

echo “START_TIME=$(date +%s)” >> “$processing_file”

echo “STATUS=processing” >> “$processing_file”

Execute job with timeout

local job_output=“$QUEUE_DIR/processing/${JOB_ID}.out”

local job_result=0

case “$JOB_TYPE” in

“compress”)

timeout $WORKER_TIMEOUT tar czf ”${JOB_DATA}.tar.gz” “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“backup”)

timeout $WORKER_TIMEOUT rsync -av “$JOB_DATA” “/backup/$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“report”)

timeout $WORKER_TIMEOUT /usr/local/bin/generate_report.sh “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“custom”)

Execute custom command (careful with security!)

timeout $WORKER_TIMEOUT bash -c “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

echo “Unknown job type: $JOB_TYPE” > “$job_output”

job_result=1

;;

esac

Move to completed or failed

local end_time=$(date +%s)

local duration=$((end_time - START_TIME))

echo “END_TIME=$end_time” >> “$processing_file”

echo “DURATION=$duration” >> “$processing_file”

echo “EXIT_CODE=$job_result” >> “$processing_file”

if [ $job_result -eq 0 ]; then

echo “STATUS=completed” >> “$processing_file”

mv “$processing_file” “$QUEUE_DIR/completed/”

mv “$job_output” “$QUEUE_DIR/completed/”

log “Worker $worker_id completed job $JOB_ID in ${duration}s”

else

echo “STATUS=failed” >> “$processing_file”

mv “$processing_file” “$QUEUE_DIR/failed/”

mv “$job_output” “$QUEUE_DIR/failed/”

log “Worker $worker_id: job $JOB_ID failed (exit code: $job_result)”

done

log “Worker $worker_id stopped”

}

Status monitoring function

monitor_status() {

while [ ! -f “$QUEUE_DIR/.shutdown” ]; do

clear

echo “Job Queue Status - $(date)”

echo ”================================“

Count jobs by status

local pending=$(ls -1 “$QUEUE_DIR/pending/“*.job 2>/dev/null | wc -l)

local processing=$(ls -1 “$QUEUE_DIR/processing/“*.job 2>/dev/null | wc -l)

local completed=$(ls -1 “$QUEUE_DIR/completed/“*.job 2>/dev/null | wc -l)

local failed=$(ls -1 “$QUEUE_DIR/failed/“*.job 2>/dev/null | wc -l)

echo “Pending: $pending”

echo “Processing: $processing”

echo “Completed: $completed”

echo “Failed: $failed”

echo ""

Show current processing jobs

if [ $processing -gt 0 ]; then

echo “Currently Processing:”

echo ”--------------------”

for job in “$QUEUE_DIR/processing/“*.job; do

[ -f “$job” ] || continue

source “$job”

local runtime=$(($(date +%s) - START_TIME))

printf ” Job %s (Worker %d): %s - %ds\n” \

“$JOB_ID” “$WORKER” “$JOB_TYPE” “$runtime”

done

echo ""

Show worker status

echo “Workers:”

echo ”--------”

for i in $(seq 1 $WORKERS); do

if kill -0 ${WORKER_PIDS[$i]} 2>/dev/null; then

echo ” Worker $i: Running (PID: ${WORKER_PIDS[$i]})”

else

echo ” Worker $i: Stopped”

done

sleep 5

done

}

Main execution

if [ “$1” = “add” ]; then

Add job mode

shift

add_job ”$@”

exit 0

if [ “$1” = “status” ]; then

Status mode

monitor_status

exit 0

Start queue manager

log “Starting job queue system with $WORKERS workers”

echo $$ > “$PID_FILE”

Start workers

declare -a WORKER_PIDS

for i in $(seq 1 $WORKERS); do

worker_process $i &

WORKER_PIDS[$i]=$!

done

Wait for all workers

log “Job queue system running. Press Ctrl+C to stop.”

wait

This implements a complete job queue with priority handling, multiple workers, atomic job claiming, timeout protection, and comprehensive logging. It’s the kind of system you’d build to handle background tasks in production.

Shell Scripting Interview Questions and Answers For Experienced (5+ years Exp)

At the senior level, advanced shell scripting interview questions dig deep into architectural decisions, performance at scale, and the wisdom that comes from maintaining production systems through their entire lifecycle. These questions explore how you handle the messy realities of enterprise environments - legacy system integration, cross-platform compatibility nightmares, and scripts that need to run reliably for years. The focus shifts from “can you write it” to “have you lived through the consequences of different approaches” and whether you can design solutions that other engineers can maintain long after you’ve moved on.

Q1: How do you design shell scripts for high-availability environments where failure isn’t an option?

After years of 3 AM calls, I’ve learned that HA scripts need multiple layers of protection. First, I implement circuit breakers - if a script fails repeatedly, it stops trying and alerts instead of hammering a broken system. I use distributed locking across nodes (often with Redis or etcd) to prevent split-brain scenarios. Here’s my approach:

Health checks before actions: Never assume a service is up. Always verify endpoints are responding correctly before sending traffic.

Idempotency everywhere: Every operation should be safe to run multiple times. I use checksums, version checks, and state files to ensure this.

Graceful degradation: If a non-critical component fails, the script continues with reduced functionality rather than failing completely.

Audit trails: Every action gets logged with who/what/when/why, often shipped to a central logging system for correlation.

The key insight I’ve gained is that HA isn’t about preventing failures - it’s about failing gracefully and recovering automatically. I’ve seen too many “clever” scripts that made things worse during outages.

Q2: Explain your approach to handling sensitive data in shell scripts

This is where experience really shows. Never, ever put secrets in scripts. I’ve cleaned up too many messes where passwords were in version control. My approach:

Environment variables for local development only, never in production

Secrets management tools like HashiCorp Vault, AWS Secrets Manager, or Kubernetes secrets

Temporary credentials with short TTLs whenever possible

Audit logging for every secret access

I also use set +x before any line that might expose secrets in debug output, and I’m paranoid about temporary files - always created with mktemp and proper permissions, cleaned up in trap handlers. I’ve seen scripts that wrote database passwords to /tmp with world-readable permissions. That’s a career-limiting move.

Q3: How do you optimize shell scripts that process millions of records?

The biggest lesson I’ve learned: the shell isn’t always the answer. But when it is, here’s what works:

Streaming over loading: Never load large datasets into memory. Use pipes and process line by line.

GNU Parallel for CPU-bound tasks. It’s incredible how much faster things run with proper parallelization.

Sort/join over nested loops: Unix tools are optimized for this. A sort | join pipeline beats nested loops every time.

Minimize process spawning: That for loop calling sed 1000 times? Rewrite it as a single awk script.

Real example: I once replaced a 6-hour customer data processing script with a sort | join | awk pipeline that ran in 12 minutes. The original developer was reading a 2GB file into an associative array. Sometimes the old Unix philosophy of small, focused tools really shines.

Q4: Describe your most complex debugging experience with shell scripts

The worst one that comes to mind involved a script that worked perfectly for 3 years, then started randomly failing on Tuesdays. Turned out a log rotation job was creating a race condition, but only when the log file exceeded 2GB, which only happened on our busiest day.

My debugging toolkit has evolved from painful experiences:

strace/dtrace to see actual system calls

bash -x is just the starting point; I often add custom debug functions that can be toggled

Reproducing environments exactly - same shell version, same locale settings, same everything

Binary search debugging - comment out half the script, see if it still fails

The real skill is knowing when to stop debugging and rewrite. I’ve learned that if I’m spending more than a day debugging a complex script, it’s probably too complex and needs to be simplified or rewritten in a proper programming language.

Q5: How do you handle shell script deployment and versioning across hundreds of servers?

Configuration management is crucial here. I’ve used Puppet, Ansible, and Chef, but the principles remain the same:

Immutable deployments: Scripts are versioned packages, not edited in place

Canary deployments: Roll out to 1%, then 10%, then 50%, monitoring metrics at each stage

Feature flags: Even in shell scripts, I implement toggles for risky changes

Rollback procedures: Always have a quick way back. I version with symlinks for instant rollback

One hard lesson: never trust system package managers alone. I’ve been burned by different versions of bash, different coreutils implementations, even different versions of basic commands like ‘date’. Now I always include compatibility checks and sometimes ship specific binary versions with my scripts.

Q6: What’s your philosophy on when to use shell scripts versus a “real” programming language?

This is the question that separates experienced engineers from script kiddies. My rule of thumb:

Shell scripts are perfect for:

Gluing systems together

System administration tasks

Quick prototypes

Build and deployment automation

But I switch to Python/Go/Ruby when:

The script exceeds 200-300 lines

I need complex data structures

Error handling becomes more complex than the actual logic

I’m parsing anything more complex than simple delimited data

Performance is critical

I’ve maintained 2000-line bash scripts. It’s not fun. The maintenance cost grows exponentially with complexity. Now I’m quick to recognize when bash has served its purpose and it’s time to rewrite.

Q7: How do you ensure shell script security in a zero-trust environment?

Security has evolved way beyond just checking inputs. In zero-trust environments:

No permanent credentials: Everything uses temporary tokens with the minimum required scope

Mutual TLS for script-to-service communication

Signed scripts: We GPG sign critical scripts and verify signatures before execution

Minimal attack surface: Scripts run in containers or VMs with only required tools installed

Security scanning: All scripts go through static analysis tools looking for common vulnerabilities

I’ve also learned to be paranoid about seemingly innocent things. That curl command downloading a script from GitHub? What if someone compromises the repo? Now I verify checksums and use pinned versions for everything.

Q8: Describe your approach to making shell scripts cloud-agnostic

After migrating between AWS, GCP, and Azure multiple times, I’ve learned:

Abstract provider-specific calls: Create wrapper functions for cloud operations

Use cloud SDK tools carefully: They change. Always pin versions and have fallbacks

Metadata services are different: Each cloud has its own way. Abstract this early

Storage is never just storage: S3, GCS, and Azure Blob have subtle differences

Example approach:

cloud_provider_detect() {

if curl -s -f -m 1 http://169.254.169.254/latest/meta-data/instance-id >/dev/null 2>&1; then

echo “aws”

elif curl -s -f -m 1 -H “Metadata-Flavor: Google” http://metadata.google.internal/computeMetadata/v1/instance/id >/dev/null 2>&1; then

echo “gcp”

else

echo “azure” # Or onprem, needs more logic

}

The key is planning for portability from day one. Retrofitting cloud-agnostic behavior is painful.

Q9: How do you handle backwards compatibility when system tools get updated?

This is a constant battle. GNU vs BSD tools, different bash versions, changing command options. My strategies:

Feature detection over version detection: Don’t check if it’s bash 4.3, check if it supports associative arrays

Wrapper functions: Abstract system commands behind functions that handle differences

Comprehensive testing matrix: Test on minimum supported versions of everything

Document requirements explicitly: Every script starts with a requirements block

I maintain compatibility libraries for common issues:

Example: readlink portability

portable_readlink() {

if readlink -f “$1” 2>/dev/null; then

return

elif command -v greadlink >/dev/null; then

greadlink -f “$1”

else

Fallback implementation

python -c “import os; print(os.path.realpath(‘$1’))”

}

Q10: What are the most important lessons you’ve learned about shell scripting in production?

The scars teach the best lessons:

Observability beats cleverness: A simple script with great logging beats a clever script you can’t debug

Plan for failure from line 1: Not just error handling, but operational failure - what happens when your script runs during a datacenter failover?

Other people will maintain your code: Write for them, not to show off. Comment the “why”, not the “what”

Test the unhappy paths: Everyone tests success. Test what happens when that API returns 500, when the disk is full, when DNS is flaking

Know when to stop: Some problems shouldn’t be solved in bash. Recognizing this saves weeks of pain

The meta-lesson: shell scripting in production is 20% writing code and 80% thinking about what could go wrong. Every senior engineer has war stories about simple scripts causing major outages. The difference is we’ve learned to be paranoid in productive ways.

Shell Scripting Coding Interview Questions and Answers For Experienced (5+ years Exp)

Senior-level advanced shell scripting interview questions go beyond algorithms to test battle-hardened production experience - can you write code that survives server crashes, handles race conditions, and scales across distributed systems? These coding challenges reflect real scenarios you’ve probably debugged at 3 AM: service orchestration failures, data corruption recovery, and the kind of edge cases that only show up after months in production. The solutions here demonstrate not just working code, but the defensive programming and architectural thinking that comes from years of learning things the hard way.

Q1: Write a distributed lock manager for coordinating jobs across multiple servers

This solves the classic problem of preventing duplicate cron jobs in clustered environments:

#!/bin/bash

Distributed lock implementation using shared filesystem or Redis

Handles stale locks, network partitions, and crash recovery

LOCK_DIR=“/shared/locks”

REDIS_HOST=”${REDIS_HOST:-localhost}”

REDIS_PORT=”${REDIS_PORT:-6379}”

BACKEND=”${LOCK_BACKEND:-filesystem}” # or “redis”

Configuration

readonly SCRIPT_NAME=$(basename “$0”)

readonly HOSTNAME=$(hostname -f)

readonly PID=$$

readonly LOCK_TIMEOUT=${LOCK_TIMEOUT:-300} # 5 minutes default

readonly LOCK_RETRY_INTERVAL=2

readonly STALE_LOCK_THRESHOLD=600 # 10 minutes

Generate unique lock identifier

generate_lock_id() {

echo ”${HOSTNAME}-${PID}-$(date +%s%N)”

}

Filesystem-based lock implementation

fs_acquire_lock() {

local lock_name=$1

local lock_file=“$LOCK_DIR/$lock_name.lock”

local lock_id=$(generate_lock_id)

local temp_file=$(mktemp)

Write lock metadata

cat > “$temp_file” << EOF

{

“holder”: “$lock_id”,

“hostname”: “$HOSTNAME”,

“pid”: $PID,

“acquired”: $(date +%s),

“timeout”: $LOCK_TIMEOUT,

“command”: “$0 $*”

}

EOF

Try atomic lock creation

if ln “$temp_file” “$lock_file” 2>/dev/null; then

rm -f “$temp_file”

echo “$lock_id”

return 0

Lock exists, check if stale

if [ -f “$lock_file” ]; then

local lock_age=$(($(date +%s) - $(stat -c %Y “$lock_file” 2>/dev/null || stat -f %m “$lock_file”)))

local lock_data=$(cat “$lock_file” 2>/dev/null)

local lock_pid=$(echo “$lock_data” | grep -o ‘“pid”: [0-9]’ | grep -o ‘[0-9]’)

local lock_host=$(echo “$lock_data” | grep -o ‘“hostname”: ”[^”]*”’ | cut -d’”’ -f4)

Check if lock is stale

if [ $lock_age -gt $STALE_LOCK_THRESHOLD ]; then

echo “Removing stale lock (age: ${lock_age}s)” >&2

rm -f “$lock_file”

Retry

fs_acquire_lock “$lock_name”

return $?

Check if process still exists (same host only)

if [ “$lock_host” = “$HOSTNAME” ] && [ -n “$lock_pid” ]; then

if ! kill -0 “$lock_pid” 2>/dev/null; then

echo “Removing lock from dead process $lock_pid” >&2

rm -f “$lock_file”

Retry

fs_acquire_lock “$lock_name”

return $?

rm -f “$temp_file”

return 1

}

fs_release_lock() {

local lock_name=$1

local lock_id=$2

local lock_file=“$LOCK_DIR/$lock_name.lock”

Verify we own the lock before releasing

if [ -f “$lock_file” ]; then

local current_holder=$(grep -o ‘“holder”: ”[^”]*”’ “$lock_file” 2>/dev/null | cut -d’”’ -f4)

if [ “$current_holder” = “$lock_id” ]; then

rm -f “$lock_file”

return 0

else

echo “Cannot release lock owned by $current_holder” >&2

return 1

return 0

}

Redis-based lock implementation

redis_acquire_lock() {

local lock_name=$1

local lock_id=$(generate_lock_id)

local lock_key=“dlock:$lock_name”

Try to acquire lock with NX (only if not exists) and EX (expiry)

local result=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

SET “$lock_key” “$lock_id” NX EX “$LOCK_TIMEOUT” 2>/dev/null)

if [ “$result” = “OK” ]; then

Store metadata separately

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” >/dev/null 2>&1 <<EOF

HSET ”${lock_key}:meta” holder “$lock_id”

HSET ”${lock_key}:meta” hostname “$HOSTNAME”

HSET ”${lock_key}:meta” pid “$PID”

HSET ”${lock_key}:meta” acquired ”$(date +%s)”

EXPIRE ”${lock_key}:meta” “$LOCK_TIMEOUT”

EOF

echo “$lock_id”

return 0

Check if lock is held by dead process on same host

local lock_host=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

HGET ”${lock_key}:meta” hostname 2>/dev/null)

local lock_pid=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

HGET ”${lock_key}:meta” pid 2>/dev/null)

if [ “$lock_host” = “$HOSTNAME” ] && [ -n “$lock_pid” ]; then

if ! kill -0 “$lock_pid” 2>/dev/null; then

echo “Removing lock from dead process $lock_pid” >&2

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” DEL “$lock_key” ”${lock_key}:meta” >/dev/null

Retry

redis_acquire_lock “$lock_name”

return $?

return 1

}

redis_release_lock() {

local lock_name=$1

local lock_id=$2

local lock_key=“dlock:$lock_name”

Use Lua script for atomic check-and-delete

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” —eval - “$lock_key” “$lock_id” <<‘EOF’ >/dev/null

if redis.call(“GET”, KEYS[1]) == ARGV[1] then

redis.call(“DEL”, KEYS[1], KEYS[1] .. ":meta")

return 1

else

return 0

end

EOF

}

Main lock interface

acquire_lock() {

local lock_name=$1

local max_wait=${2:-0}

local waited=0

while true; do

local lock_id

case “$BACKEND” in

filesystem)

lock_id=$(fs_acquire_lock “$lock_name”)

;;

redis)

lock_id=$(redis_acquire_lock “$lock_name”)

;;

echo “Unknown backend: $BACKEND” >&2

return 1

;;

esac

if [ -n “$lock_id” ]; then

echo “$lock_id”

return 0

if [ $max_wait -gt 0 ] && [ $waited -ge $max_wait ]; then

return 1

sleep $LOCK_RETRY_INTERVAL

waited=$((waited + LOCK_RETRY_INTERVAL))

done

}

release_lock() {

local lock_name=$1

local lock_id=$2

case “$BACKEND” in

filesystem)

fs_release_lock “$lock_name” “$lock_id”

;;

redis)

redis_release_lock “$lock_name” “$lock_id”

;;

esac

}

Lock manager with automatic cleanup

with_lock() {

local lock_name=$1

shift

local lock_id

lock_id=$(acquire_lock “$lock_name” 30) # Wait up to 30 seconds

if [ -z “$lock_id” ]; then

echo “Failed to acquire lock: $lock_name” >&2

return 1

Set up cleanup trap

trap “release_lock ‘$lock_name’ ‘$lock_id’” EXIT INT TERM

echo “Lock acquired: $lock_name (id: $lock_id)” >&2

Execute command

”$@”

local exit_code=$?

Release lock

release_lock “$lock_name” “$lock_id”

trap - EXIT INT TERM

return $exit_code

}

Example usage for the interview

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

Demo: critical section protection

critical_task() {

echo “Starting critical task on $HOSTNAME…”

sleep 10

echo “Critical task completed”

}

Use distributed lock

with_lock “my-critical-task” critical_task

This implementation handles real-world edge cases: stale locks from crashed processes, network partitions, and provides both filesystem and Redis backends for different infrastructure setups.

Q2: Create a self-healing service monitor that automatically restarts failed services with exponential backoff

This handles the complexity of service management in production:

#!/bin/bash

Self-healing service monitor with intelligent restart logic

Prevents restart loops and implements circuit breaker pattern

readonly CONFIG_DIR=“/etc/service-monitor”

readonly STATE_DIR=“/var/lib/service-monitor”

readonly LOG_FILE=“/var/log/service-monitor.log”

Initialize directories

mkdir -p “$CONFIG_DIR” “$STATE_DIR”

Service configuration example:

{

“name”: “web-app”,

“check_cmd”: “curl -sf http://localhost:8080/health”,

“start_cmd”: “systemctl start web-app”,

“stop_cmd”: “systemctl stop web-app”,

“restart_cmd”: “systemctl restart web-app”,

“max_restarts”: 5,

“restart_window”: 3600,

“backoff_base”: 2,

“max_backoff”: 300

}

Logging with severity

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” | tee -a “$LOG_FILE”

Also log to syslog if available

if command -v logger >/dev/null 2>&1; then

logger -t “service-monitor” -p “daemon.$level” ”$*”

}

Load service configuration

load_service_config() {

local service_name=$1

local config_file=“$CONFIG_DIR/${service_name}.json”

if [ ! -f “$config_file” ]; then

log “error” “Configuration not found for service: $service_name”

return 1

Parse JSON (using jq if available, fallback to python)

if command -v jq >/dev/null 2>&1; then

jq -r ‘to_entries | .[] | “(.key)=(.value)”’ “$config_file”

else

python -c ”

import json, sys

with open(‘$config_file’) as f:

config = json.load(f)

for k, v in config.items():

print(f’{k}={v}’)

”

}

Get service state

get_service_state() {

local service_name=$1

local state_file=“$STATE_DIR/${service_name}.state”

if [ ! -f “$state_file” ]; then

Initialize state

cat > “$state_file” << EOF

restart_count=0

last_restart=0

consecutive_failures=0

backoff_seconds=1

status=unknown

last_check=0

total_restarts=0

circuit_breaker=closed

EOF

source “$state_file”

}

Update service state

update_service_state() {

local service_name=$1

local state_file=“$STATE_DIR/${service_name}.state”

shift

Read current state

local temp_file=$(mktemp)

cp “$state_file” “$temp_file” 2>/dev/null || touch “$temp_file”

Update specified values

while [ $# -gt 0 ]; do

local key=”${1%%=*}”

local value=”${1#*=}“

Update or add key

if grep -q ”^${key}=” “$temp_file”; then

sed -i “s/^${key}=.*/${key}=${value}/” “$temp_file”

else

echo ”${key}=${value}” >> “$temp_file”

shift

done

Atomic update

mv “$temp_file” “$state_file”

}

Calculate exponential backoff

calculate_backoff() {

local base=$1

local attempt=$2

local max_backoff=$3

local backoff=$((base ** attempt))

if [ $backoff -gt $max_backoff ]; then

backoff=$max_backoff

Add jitter (±25%) to prevent thundering herd

local jitter=$((backoff / 4))

local random_jitter=$((RANDOM % (jitter * 2) - jitter))

backoff=$((backoff + random_jitter))

echo $backoff

}

Check if service is healthy

check_service_health() {

local check_cmd=$1

Execute health check with timeout

timeout 30 bash -c “$check_cmd” >/dev/null 2>&1

}

Restart service with backoff

restart_service() {

local service_name=$1

local restart_cmd=$2

local backoff_base=$3

local max_backoff=$4

Get current state

get_service_state “$service_name”

Calculate backoff

local backoff=$(calculate_backoff “$backoff_base” “$consecutive_failures” “$max_backoff”)

log “warning” “Restarting $service_name (attempt $((consecutive_failures + 1)), waiting ${backoff}s)“

Wait with exponential backoff

sleep “$backoff”

Attempt restart

if $restart_cmd; then

log “info” “Successfully restarted $service_name”

update_service_state “$service_name” \

“last_restart=$(date +%s)” \

“backoff_seconds=1” \

“total_restarts=$((total_restarts + 1))”

return 0

else

log “error” “Failed to restart $service_name”

return 1

}

Main monitoring loop for a service

monitor_service() {

local service_name=$1

Load configuration

eval ”$(load_service_config “$service_name”)“

Configuration validation

for required in check_cmd restart_cmd max_restarts restart_window; do

if [ -z ”${!required}” ]; then

log “error” “Missing required config: $required for $service_name”

return 1

done

Set defaults

backoff_base=${backoff_base:-2}

max_backoff=${max_backoff:-300}

log “info” “Starting monitor for $service_name”

while true; do

Get current state

get_service_state “$service_name”

Check if circuit breaker is open

if [ “$circuit_breaker” = “open” ]; then

local circuit_break_duration=$(($(date +%s) - last_restart))

if [ $circuit_break_duration -lt 3600 ]; then

sleep 60

continue

else

Try to close circuit breaker

log “info” “Attempting to close circuit breaker for $service_name”

update_service_state “$service_name” “circuit_breaker=half-open”

Perform health check

if check_service_health “$check_cmd”; then

Service is healthy

if [ “$status” != “healthy” ] || [ “$consecutive_failures” -gt 0 ]; then

log “info” “$service_name is healthy”

update_service_state “$service_name” \

“status=healthy” \

“consecutive_failures=0” \

“circuit_breaker=closed”

else

Service is unhealthy

log “error” “$service_name health check failed”

Update failure count

consecutive_failures=$((consecutive_failures + 1))

update_service_state “$service_name” \

“status=unhealthy” \

“consecutive_failures=$consecutive_failures” \

“last_check=$(date +%s)“

Check restart window

current_time=$(date +%s)

window_start=$((current_time - restart_window))

Count restarts in window

recent_restarts=0

if [ -f “$STATE_DIR/${service_name}.history” ]; then

recent_restarts=$(awk -v start=“$window_start” ‘$1 > start’ \

“$STATE_DIR/${service_name}.history” | wc -l)

Check if we should restart

if [ $recent_restarts -ge $max_restarts ]; then

if [ “$circuit_breaker” != “open” ]; then

log “error” “Circuit breaker OPEN for $service_name (too many restarts)”

update_service_state “$service_name” “circuit_breaker=open”

Send alert

send_alert “$service_name” “Circuit breaker opened - manual intervention required”

else

Attempt restart

if restart_service “$service_name” “$restart_cmd” “$backoff_base” “$max_backoff”; then

Log restart

echo ”$(date +%s) restart” >> “$STATE_DIR/${service_name}.history”

Wait before next check to allow service to stabilize

sleep 30

else

Restart failed, increase backoff

update_service_state “$service_name” \

“consecutive_failures=$consecutive_failures”

Wait before next check

sleep ”${check_interval:-60}”

done

}

Send alerts (integrate with your alerting system)

send_alert() {

local service=$1

local message=$2

Example integrations

if [ -n “$SLACK_WEBHOOK” ]; then

curl -X POST “$SLACK_WEBHOOK” \

-H “Content-Type: application/json” \

-d ”{“text”: “Service Monitor Alert: $service - $message”}” \

2>/dev/null || true

Email alert

if command -v mail >/dev/null 2>&1 && [ -n “$ALERT_EMAIL” ]; then

echo “$message” | mail -s “Service Monitor: $service” “$ALERT_EMAIL”

log “alert” “$service: $message”

}

Main execution

main() {

Monitor all configured services

for config_file in “$CONFIG_DIR”/*.json; do

[ -f “$config_file” ] || continue

service_name=$(basename “$config_file” .json)

monitor_service “$service_name” &

Store PID for management

echo $! > “$STATE_DIR/${service_name}.pid”

done

Wait for all monitors

wait

}

Handle signals

trap ‘log “info” “Shutting down service monitor”; pkill -P $$; exit 0’ TERM INT

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

Q3: Implement a log aggregation and analysis system that detects anomalies across distributed services

This demonstrates handling big data streams and pattern recognition:

#!/bin/bash

Distributed log analysis system with anomaly detection

Handles multiple log formats, performs statistical analysis, and alerts on anomalies

readonly AGGREGATOR_PORT=9514

readonly ANALYSIS_INTERVAL=60

readonly BASELINE_WINDOW=3600 # 1 hour

readonly ANOMALY_THRESHOLD=3 # Standard deviations

readonly STATE_DIR=“/var/lib/log-analyzer”

readonly PATTERNS_FILE=“/etc/log-analyzer/patterns.conf”

mkdir -p “$STATE_DIR”

Pattern definitions for different log types

declare -A LOG_PATTERNS=(

[“nginx”]=’(?P<ip>\S+) \S+ \S+ [(?P<time>[^]]+)] ”(?P<method>\S+) (?P<path>\S+) \S+” (?P<status>\d+) (?P<size>\d+)’

[“apache”]=’(?P<ip>\S+) \S+ \S+ [(?P<time>[^]]+)] ”(?P<method>\S+) (?P<path>\S+) \S+” (?P<status>\d+) (?P<size>\d+)’

[“syslog”]=’(?P<time>\S+ \S+ \S+) (?P<host>\S+) (?P<process>[^[]+)[(?P<pid>\d+)]: (?P<message>.*)’

[“json”]=’^\{.*\}$’

)

Statistical functions

calculate_stats() {

Input: newline-separated numbers

Output: count mean stddev min max p95

awk ’

{

data[NR] = $1

sum += $1

if (NR == 1 || $1 < min) min = $1

if (NR == 1 || $1 > max) max = $1

}

END {

if (NR == 0) {

print “0 0 0 0 0 0”

exit

}

mean = sum / NR

Calculate variance

for (i = 1; i <= NR; i++) {

variance += (data[i] - mean) ^ 2

}

variance = variance / NR

stddev = sqrt(variance)

Calculate 95th percentile

n = int(NR * 0.95)

if (n < 1) n = 1

Sort for percentile

for (i = 1; i <= NR; i++) {

for (j = i + 1; j <= NR; j++) {

if (data[i] > data[j]) {

temp = data[i]

data[i] = data[j]

data[j] = temp

}

print NR, mean, stddev, min, max, data[n]

}

’

}

Z-score anomaly detection

is_anomaly() {

local value=$1

local mean=$2

local stddev=$3

local threshold=${4:-$ANOMALY_THRESHOLD}

Avoid division by zero

if (( $(echo “$stddev == 0” | bc -l) )); then

return 1

Calculate z-score

local zscore=$(echo “scale=2; ($value - $mean) / $stddev” | bc -l)

local abs_zscore=$(echo “scale=2; if ($zscore < 0) -$zscore else $zscore” | bc -l)

if (( $(echo “$abs_zscore > $threshold” | bc -l) )); then

return 0 # Is anomaly

else

return 1 # Not anomaly

}

Parse log line based on format

parse_log_line() {

local line=$1

local log_type=$2

case “$log_type” in

json)

Parse JSON log

if command -v jq >/dev/null 2>&1; then

echo “$line” | jq -r ’

@tsv “(.timestamp // now | todate) (.level // “INFO”) (.service // “unknown”) (.message // "")”

’ 2>/dev/null

else

Fallback to python

python3 -c ”

import json, sys, datetime

try:

data = json.loads(‘$line’)

timestamp = data.get(‘timestamp’, datetime.datetime.now().isoformat())

level = data.get(‘level’, ‘INFO’)

service = data.get(‘service’, ‘unknown’)

message = data.get(‘message’, ”)

print(f’{timestamp}\t{level}\t{service}\t{message}’)

except:

pass

”

;;

nginx|apache)

Parse using regex

python3 -c ”

import re

pattern = r’${LOG_PATTERNS[$log_type]}’

match = re.match(pattern, '''$line''')

if match:

data = match.groupdict()

print(f”{data.get(‘time’, ”)}\t{data.get(‘status’, ”)}\t{data.get(‘method’, ”)}\t{data.get(‘path’, ”)}”)

”

;;

Generic parsing

echo “$line” | awk ‘{print $1, $2, $3, $0}’

;;

esac

}

Aggregate logs from multiple sources

aggregate_logs() {

local output_file=$1

local duration=$2

Start log receiver

nc -l -k -p “$AGGREGATOR_PORT” > “$output_file” 2>/dev/null &

local nc_pid=$!

Also collect local logs

if command -v journalctl >/dev/null 2>&1; then

journalctl -f —since=“$duration seconds ago” >> “$output_file” 2>/dev/null &

local journal_pid=$!

Collect for specified duration

sleep “$duration”

Stop collectors

kill $nc_pid 2>/dev/null

[ -n “$journal_pid” ] && kill $journal_pid 2>/dev/null

Return line count

wc -l < “$output_file”

}

Analyze log patterns and detect anomalies

analyze_logs() {

local log_file=$1

local analysis_output=“$STATE_DIR/analysis_$(date +%s).json”

Initialize metrics

declare -A error_counts

declare -A response_times

declare -A status_codes

declare -A service_errors

Process each log line

while IFS= read -r line; do

Detect log type

local log_type=“generic”

if [[ “$line” =~ ^\{ ]]; then

log_type=“json”

elif [[ “$line” =~ “[A-Z]+\ .*HTTP ]]; then

log_type=“nginx”

Parse line

local parsed=$(parse_log_line “$line” “$log_type”)

[ -z “$parsed” ] && continue

Extract fields based on log type

case “$log_type” in

json)

IFS=$‘\t’ read -r timestamp level service message <<< “$parsed”

Count errors by service

if [[ “$level” =~ ERROR|CRITICAL|FATAL ]]; then

((service_errors[“$service”]++))

;;

nginx|apache)

IFS=$‘\t’ read -r timestamp status method path <<< “$parsed”

Count status codes

((status_codes[“$status”]++))

Track error rates

if [[ “$status” =~ ^[45] ]]; then

((error_counts[“http_errors”]++))

;;

esac

done < “$log_file”

Load historical baselines

local baseline_file=“$STATE_DIR/baseline.json”

if [ -f “$baseline_file” ]; then

Compare with baseline

for service in ”${!service_errors[@]}”; do

local current_errors=${service_errors[$service]}

local baseline_mean=$(jq -r “.services.$service.error_rate.mean // 0” “$baseline_file”)

local baseline_stddev=$(jq -r “.services.$service.error_rate.stddev // 1” “$baseline_file”)

if is_anomaly “$current_errors” “$baseline_mean” “$baseline_stddev”; then

log “alert” “Anomaly detected: $service error rate is $current_errors (baseline: $baseline_mean ± $baseline_stddev)”

send_anomaly_alert “$service” “error_rate” “$current_errors” “$baseline_mean” “$baseline_stddev”

done

Generate analysis report

cat > “$analysis_output” << EOF

{

“timestamp”: ”$(date -u +%Y-%m-%dT%H:%M:%SZ)”,

“total_lines”: $(wc -l < “$log_file”),

“services”: {

EOF

Add service metrics

local first=true

for service in ”${!service_errors[@]}”; do

[ “$first” = true ] && first=false || echo ”,” >> “$analysis_output”

cat >> “$analysis_output” << EOF

“$service”: {

“error_count”: ${service_errors[$service]},

“error_rate”: $(echo “scale=2; ${service_errors[$service]} * 100 / $(wc -l < “$log_file”)” | bc)

}

EOF

done

echo -e “\n },\n “status_codes”: {” >> “$analysis_output”

Add status code distribution

first=true

for status in ”${!status_codes[@]}”; do

[ “$first” = true ] && first=false || echo ”,” >> “$analysis_output”

echo -n ” “$status”: ${status_codes[$status]}” >> “$analysis_output”

done

echo -e “\n }\n}” >> “$analysis_output”

Update baseline with exponential moving average

update_baseline “$analysis_output”

echo “$analysis_output”

}

Update baseline metrics

update_baseline() {

local current_analysis=$1

local baseline_file=“$STATE_DIR/baseline.json”

local alpha=0.3 # EMA weight for new data

if [ ! -f “$baseline_file” ]; then

Initialize baseline

cp “$current_analysis” “$baseline_file”

return

Update baseline using exponential moving average

python3 - “$baseline_file” “$current_analysis” “$alpha” << ‘EOF’

import json

import sys

baseline_file, current_file, alpha = sys.argv[1:4]

alpha = float(alpha)

with open(baseline_file) as f:

baseline = json.load(f)

with open(current_file) as f:

current = json.load(f)

Update service metrics

for service, metrics in current.get(‘services’, {}).items():

if service not in baseline.get(‘services’, {}):

baseline.setdefault(‘services’, {})[service] = {

‘error_rate’: {‘mean’: 0, ‘stddev’: 1}

}

Update using EMA

old_mean = baseline[‘services’][service][‘error_rate’][‘mean’]

new_value = metrics[‘error_rate’]

new_mean = alpha * new_value + (1 - alpha) * old_mean

Update variance estimate

old_var = baseline[‘services’][service][‘error_rate’].get(‘variance’, 1)

new_var = alpha * ((new_value - new_mean) ** 2) + (1 - alpha) * old_var

baseline[‘services’][service][‘error_rate’] = {

‘mean’: new_mean,

‘stddev’: new_var ** 0.5,

‘variance’: new_var

}

Save updated baseline

with open(baseline_file, ‘w’) as f:

json.dump(baseline, f, indent=2)

EOF

}

Send anomaly alerts

send_anomaly_alert() {

local service=$1

local metric=$2

local current_value=$3

local baseline_mean=$4

local baseline_stddev=$5

local zscore=$(echo “scale=2; ($current_value - $baseline_mean) / $baseline_stddev” | bc -l)

Create alert payload

local alert_data=$(cat << EOF

{

“service”: “$service”,

“metric”: “$metric”,

“current_value”: $current_value,

“baseline_mean”: $baseline_mean,

“baseline_stddev”: $baseline_stddev,

“z_score”: $zscore,

“timestamp”: ”$(date -u +%Y-%m-%dT%H:%M:%SZ)”,

“severity”: ”$([ $(echo “$zscore > 5” | bc) -eq 1 ] && echo “critical” || echo “warning”)”

}

EOF

)

Send to various channels

if [ -n “$WEBHOOK_URL” ]; then

curl -X POST “$WEBHOOK_URL” \

-H “Content-Type: application/json” \

-d “$alert_data” \

2>/dev/null || true

Log alert

echo “$alert_data” >> “$STATE_DIR/alerts.jsonl”

Execute custom alert handler if defined

if [ -x “/etc/log-analyzer/alert-handler.sh” ]; then

echo “$alert_data” | /etc/log-analyzer/alert-handler.sh

}

Main monitoring loop

main() {

log “info” “Starting distributed log analyzer”

while true; do

local temp_log=$(mktemp)

Aggregate logs

log “info” “Collecting logs for $ANALYSIS_INTERVAL seconds”

local line_count=$(aggregate_logs “$temp_log” “$ANALYSIS_INTERVAL”)

log “info” “Collected $line_count log lines”

Analyze logs

if [ “$line_count” -gt 0 ]; then

local analysis_file=$(analyze_logs “$temp_log”)

log “info” “Analysis complete: $analysis_file”

Clean up old analysis files (keep last 24 hours)

find “$STATE_DIR” -name “analysis_*.json” -mtime +1 -delete

rm -f “$temp_log”

Wait before next cycle

sleep 10

done

}

Logging function

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” >&2

}

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

Q4: Build a zero-downtime deployment system with automatic rollback on failure

This handles complex deployment orchestration:

#!/bin/bash

Zero-downtime deployment system with health checks and automatic rollback

Supports blue-green and rolling deployments across multiple servers

readonly DEPLOY_DIR=“/opt/deployments”

readonly CONFIG_FILE=“/etc/deploy/config.json”

readonly STATE_FILE=“/var/lib/deploy/state.json”

readonly HEALTH_CHECK_RETRIES=10

readonly HEALTH_CHECK_INTERVAL=5

readonly CANARY_PERCENTAGE=10

Deployment strategies

declare -A STRATEGIES=(

[“blue-green”]=“deploy_blue_green”

[“rolling”]=“deploy_rolling”

[“canary”]=“deploy_canary”

)

Initialize directories

mkdir -p ”$(dirname “$STATE_FILE”)” “$DEPLOY_DIR”

Atomic state management

update_state() {

local temp_file=$(mktemp)

if [ -f “$STATE_FILE” ]; then

cp “$STATE_FILE” “$temp_file”

else

echo ”{}” > “$temp_file”

Update state using jq

local key=$1

local value=$2

jq —arg k “$key” —arg v “$value” ’.[$k] = $v’ “$temp_file” > ”${temp_file}.new”

mv ”${temp_file}.new” “$STATE_FILE”

rm -f “$temp_file”

}

get_state() {

local key=$1

if [ -f “$STATE_FILE” ]; then

jq -r —arg k “$key” ’.[$k] // empty’ “$STATE_FILE”

}

Load balancer management (HAProxy example)

update_load_balancer() {

local action=$1

local server=$2

local backend=${3:-“webservers”}

case “$action” in

drain)

echo “set server $backend/$server state drain” | \

socat stdio /var/run/haproxy.sock

;;

ready)

echo “set server $backend/$server state ready” | \

socat stdio /var/run/haproxy.sock

;;

maint)

echo “disable server $backend/$server” | \

socat stdio /var/run/haproxy.sock

;;

enable)

echo “enable server $backend/$server” | \

socat stdio /var/run/haproxy.sock

;;

esac

}

Wait for connection draining

wait_for_drain() {

local server=$1

local backend=${2:-“webservers”}

local max_wait=60

local waited=0

log “info” “Waiting for connections to drain from $server”

while [ $waited -lt $max_wait ]; do

local conn_count=$(echo “show servers state $backend” | \

socat stdio /var/run/haproxy.sock | \

grep “$server” | awk ‘{print $7}’)

if [ “$conn_count” -eq 0 ]; then

log “info” “Server $server drained successfully”

return 0

sleep 2

waited=$((waited + 2))

done

log “warning” “Timeout waiting for $server to drain (still $conn_count connections)”

return 1

}

Health check implementation

perform_health_check() {

local server=$1

local health_endpoint=$2

local expected_response=${3:-“200”}

local attempt=1

while [ $attempt -le $HEALTH_CHECK_RETRIES ]; do

log “debug” “Health check attempt $attempt for $server”

local response=$(curl -s -o /dev/null -w ”%{http_code}” \

—connect-timeout 5 \

—max-time 10 \

“http://$server$health_endpoint”)

if [[ “$response” =~ $expected_response ]]; then

log “info” “Health check passed for $server”

return 0

log “warning” “Health check failed for $server (got $response, expected $expected_response)”

sleep $HEALTH_CHECK_INTERVAL

attempt=$((attempt + 1))

done

log “error” “Health check failed for $server after $HEALTH_CHECK_RETRIES attempts”

return 1

}

Deploy to single server

deploy_to_server() {

local server=$1

local artifact=$2

local deploy_user=${3:-“deploy”}

local deploy_path=${4:-“/var/www/app”}

log “info” “Deploying $artifact to $server”

Create deployment directory with timestamp

local timestamp=$(date +%Y%m%d_%H%M%S)

local new_release_path=”${deploy_path}/releases/${timestamp}“

Pre-deployment commands

ssh “$deploy_user@$server” “mkdir -p ${deploy_path}/releases”

Transfer artifact

if ! scp “$artifact” “$deploy_user@$server:/tmp/deploy_${timestamp}.tar.gz”; then

log “error” “Failed to transfer artifact to $server”

return 1

Extract and prepare

if ! ssh “$deploy_user@$server” ”

set -e

cd ${deploy_path}/releases

mkdir -p ${timestamp}

tar -xzf /tmp/deploy_${timestamp}.tar.gz -C ${timestamp}

rm -f /tmp/deploy_${timestamp}.tar.gz

Run pre-deployment hooks if exists

if [ -x ${timestamp}/deploy/pre-deploy.sh ]; then

cd ${timestamp}

./deploy/pre-deploy.sh

”; then

log “error” “Failed to prepare deployment on $server”

return 1

Store current version for rollback

local current_link=$(ssh “$deploy_user@$server” “readlink ${deploy_path}/current” 2>/dev/null || echo "")

if [ -n “$current_link” ]; then

update_state “previous_release_${server}” “$current_link”

Atomic switch

if ! ssh “$deploy_user@$server” ”

ln -sfn ${new_release_path} ${deploy_path}/current

Run post-deployment hooks

if [ -x ${new_release_path}/deploy/post-deploy.sh ]; then

cd ${new_release_path}

./deploy/post-deploy.sh

Reload/restart services

sudo systemctl reload nginx || true

sudo systemctl restart app || true

”; then

log “error” “Failed to activate deployment on $server”

return 1

log “info” “Successfully deployed to $server”

return 0

}

Rollback single server

rollback_server() {

local server=$1

local deploy_user=${2:-“deploy”}

local deploy_path=${3:-“/var/www/app”}

local previous_release=$(get_state “previous_release_${server}”)

if [ -z “$previous_release” ]; then

log “error” “No previous release found for $server”

return 1

log “warning” “Rolling back $server to $previous_release”

ssh “$deploy_user@$server” ”

ln -sfn $previous_release ${deploy_path}/current

sudo systemctl reload nginx || true

sudo systemctl restart app || true

”

}

Blue-Green deployment strategy

deploy_blue_green() {

local artifact=$1

local config=$2

Parse configuration

local blue_servers=($(echo “$config” | jq -r ‘.blue_servers[]’))

local green_servers=($(echo “$config” | jq -r ‘.green_servers[]’))

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”‘)

Determine which environment is currently live

local live_env=$(get_state “live_environment”)

local target_env

local target_servers

if [ “$live_env” = “blue” ]; then

target_env=“green”

target_servers=(”${green_servers[@]}”)

else

target_env=“blue”

target_servers=(”${blue_servers[@]}”)

log “info” “Starting blue-green deployment to $target_env environment”

update_state “deployment_id” ”$(date +%s)”

update_state “deployment_status” “in_progress”

Deploy to target environment

local failed_servers=()

for server in ”${target_servers[@]}”; do

if ! deploy_to_server “$server” “$artifact”; then

failed_servers+=(“$server”)

done

if [ ${#failed_servers[@]} -gt 0 ]; then

log “error” “Deployment failed on servers: ${failed_servers[*]}”

update_state “deployment_status” “failed”

return 1

Health check all target servers

log “info” “Running health checks on $target_env environment”

for server in ”${target_servers[@]}”; do

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Health check failed for $server”

update_state “deployment_status” “failed”

Rollback is not needed in blue-green as we haven’t switched traffic

return 1

done

Switch traffic

log “info” “Switching traffic to $target_env environment”

for server in ”${target_servers[@]}”; do

update_load_balancer “enable” “$server”

done

Drain old environment

local old_servers

if [ “$target_env” = “blue” ]; then

old_servers=(”${green_servers[@]}”)

else

old_servers=(”${blue_servers[@]}”)

for server in ”${old_servers[@]}”; do

update_load_balancer “drain” “$server”

done

Wait for draining

for server in ”${old_servers[@]}”; do

wait_for_drain “$server”

update_load_balancer “maint” “$server”

done

Update state

update_state “live_environment” “$target_env”

update_state “deployment_status” “completed”

update_state “deployment_end” ”$(date +%s)”

log “info” “Blue-green deployment completed successfully”

}

Rolling deployment strategy

deploy_rolling() {

local artifact=$1

local config=$2

Parse configuration

local servers=($(echo “$config” | jq -r ‘.servers[]’))

local batch_size=$(echo “$config” | jq -r ‘.batch_size // 1’)

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”’)

local pause_between_batches=$(echo “$config” | jq -r ‘.pause_seconds // 30’)

log “info” “Starting rolling deployment (batch size: $batch_size)”

update_state “deployment_id” ”$(date +%s)”

update_state “deployment_status” “in_progress”

Process servers in batches

local total_servers=${#servers[@]}

local deployed=0

while [ $deployed -lt $total_servers ]; do

local batch_servers=()

local batch_end=$((deployed + batch_size))

Get servers for this batch

for ((i=deployed; i<batch_end && i<total_servers; i++)); do

batch_servers+=(”${servers[$i]}”)

done

log “info” “Processing batch: ${batch_servers[*]}“

Drain servers in batch

for server in ”${batch_servers[@]}”; do

update_load_balancer “drain” “$server”

done

Wait for all to drain

for server in ”${batch_servers[@]}”; do

wait_for_drain “$server”

done

Deploy to batch

local batch_failed=false

for server in ”${batch_servers[@]}”; do

if ! deploy_to_server “$server” “$artifact”; then

log “error” “Deployment failed on $server”

batch_failed=true

break

Health check immediately

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Health check failed for $server”

batch_failed=true

break

Re-enable in load balancer

update_load_balancer “ready” “$server”

done

if [ “$batch_failed” = true ]; then

log “error” “Batch deployment failed, initiating rollback”

Rollback all deployed servers

for ((i=0; i<deployed+${#batch_servers[@]}; i++)); do

rollback_server ”${servers[$i]}”

update_load_balancer “ready” ”${servers[$i]}”

done

update_state “deployment_status” “failed_rollback_completed”

return 1

deployed=$((deployed + ${#batch_servers[@]}))

Pause between batches if not the last batch

if [ $deployed -lt $total_servers ]; then

log “info” “Pausing $pause_between_batches seconds before next batch”

sleep “$pause_between_batches”

done

update_state “deployment_status” “completed”

update_state “deployment_end” ”$(date +%s)”

log “info” “Rolling deployment completed successfully”

}

Canary deployment strategy

deploy_canary() {

local artifact=$1

local config=$2

Parse configuration

local servers=($(echo “$config” | jq -r ‘.servers[]’))

local canary_duration=$(echo “$config” | jq -r ‘.canary_duration // 300’)

local success_rate_threshold=$(echo “$config” | jq -r ‘.success_rate_threshold // 99.5’)

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”‘)

Calculate canary size

local total_servers=${#servers[@]}

local canary_count=$((total_servers * CANARY_PERCENTAGE / 100))

[ $canary_count -lt 1 ] && canary_count=1

log “info” “Starting canary deployment ($canary_count of $total_servers servers)“

Deploy to canary servers

local canary_servers=(”${servers[@]:0:$canary_count}”)

for server in ”${canary_servers[@]}”; do

update_load_balancer “drain” “$server”

wait_for_drain “$server”

if ! deploy_to_server “$server” “$artifact”; then

log “error” “Canary deployment failed on $server”

rollback_server “$server”

update_load_balancer “ready” “$server”

return 1

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Canary health check failed for $server”

rollback_server “$server”

update_load_balancer “ready” “$server”

return 1

update_load_balancer “ready” “$server”

done

Monitor canary servers

log “info” “Monitoring canary servers for $canary_duration seconds”

local start_time=$(date +%s)

local end_time=$((start_time + canary_duration))

while [ $(date +%s) -lt $end_time ]; do

Check error rates

local error_rate=$(calculate_error_rate ”${canary_servers[@]}”)

local success_rate=$(echo “100 - $error_rate” | bc)

if (( $(echo “$success_rate < $success_rate_threshold” | bc -l) )); then

log “error” “Canary success rate ($success_rate%) below threshold ($success_rate_threshold%)“

Rollback canary servers

for server in ”${canary_servers[@]}”; do

update_load_balancer “drain” “$server”

done

for server in ”${canary_servers[@]}”; do

wait_for_drain “$server”

rollback_server “$server”

update_load_balancer “ready” “$server”

done

return 1

sleep 10

done

log “info” “Canary phase successful, proceeding with full deployment”

Deploy to remaining servers using rolling strategy

local remaining_servers=(”${servers[@]:$canary_count}”)

local remaining_config=$(echo “$config” | jq —argjson servers ”$(printf ‘%s\n’ ”${remaining_servers[@]}” | jq -R . | jq -s .)” ‘.servers = $servers’)

deploy_rolling “$artifact” “$remaining_config”

}

Calculate error rate from logs/metrics

calculate_error_rate() {

local servers=(”$@“)

This would integrate with your metrics system

For demo, returning mock value

echo “0.5”

}

Logging

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” | tee -a /var/log/deploy.log

}

Main deployment orchestrator

main() {

local artifact=$1

local strategy=${2:-“rolling”}

if [ ! -f “$artifact” ]; then

log “error” “Artifact not found: $artifact”

exit 1

if [ ! -f “$CONFIG_FILE” ]; then

log “error” “Configuration not found: $CONFIG_FILE”

exit 1

local config=$(cat “$CONFIG_FILE”)

Validate strategy

if [ -z ”${STRATEGIES[$strategy]}” ]; then

log “error” “Unknown deployment strategy: $strategy”

exit 1

Execute deployment

${STRATEGIES[$strategy]} “$artifact” “$config”

}

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

Q5: Create a disaster recovery automation system that handles failover between regions

This demonstrates complex orchestration and state management:

#!/bin/bash

Multi-region disaster recovery automation system

Handles database replication, DNS failover, and application state synchronization

readonly DR_CONFIG=“/etc/disaster-recovery/config.json”

readonly STATE_DIR=“/var/lib/dr-automation”

readonly HEALTH_CHECK_INTERVAL=30

readonly FAILOVER_THRESHOLD=3 # Consecutive failures before failover

readonly RPO_WARNING_THRESHOLD=300 # 5 minutes

mkdir -p “$STATE_DIR”

Region health tracking

declare -A REGION_HEALTH

declare -A REGION_FAILURES

declare -A LAST_REPLICATION_LAG

Load configuration

load_config() {

if [ ! -f “$DR_CONFIG” ]; then

log “error” “DR configuration not found”

exit 1

Export configuration as environment variables

eval ”$(jq -r ’

.regions[] |

“REGION_(.name | ascii_upcase)_ENDPOINT=“(.endpoint)"",

“REGION_(.name | ascii_upcase)_DB_HOST=“(.database.host)"",

“REGION_(.name | ascii_upcase)_DB_PORT=“(.database.port // 5432)""

’ “$DR_CONFIG”)“

Get all regions

REGIONS=($(jq -r ‘.regions[].name’ “$DR_CONFIG”))

PRIMARY_REGION=$(jq -r ‘.primary_region’ “$DR_CONFIG”)

}

Comprehensive health check for a region

check_region_health() {

local region=$1

local endpoint_var=“REGION_${region^^}_ENDPOINT”

local endpoint=”${!endpoint_var}”

log “debug” “Checking health of region: $region”

Application health check

local app_health=false

local http_status=$(curl -s -o /dev/null -w ”%{http_code}” \

—connect-timeout 5 —max-time 10 \

“https://$endpoint/health” || echo “000”)

if [ “$http_status” = “200” ]; then

app_health=true

Database health check

local db_health=false

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_port_var=“REGION_${region^^}_DB_PORT”

local db_host=”${!db_host_var}”

local db_port=”${!db_port_var}”

if pg_isready -h “$db_host” -p “$db_port” -t 5 >/dev/null 2>&1; then

db_health=true

Check replication lag if not primary

local replication_ok=true

if [ “$region” != “$PRIMARY_REGION” ]; then

local lag=$(check_replication_lag “$region”)

LAST_REPLICATION_LAG[$region]=$lag

if [ “$lag” -gt “$RPO_WARNING_THRESHOLD” ]; then

log “warning” “High replication lag for $region: ${lag}s”

replication_ok=false

Overall health status

if [ “$app_health” = true ] && [ “$db_health” = true ] && [ “$replication_ok” = true ]; then

REGION_HEALTH[$region]=“healthy”

REGION_FAILURES[$region]=0

return 0

else

REGION_HEALTH[$region]=“unhealthy”

REGION_FAILURES[$region]=$((${REGION_FAILURES[$region]:-0} + 1))

log “error” “Region $region health check failed - App: $app_health, DB: $db_health, Replication: $replication_ok”

return 1

}

Check database replication lag

check_replication_lag() {

local region=$1

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_host=”${!db_host_var}“

Get replication lag in seconds

local lag=$(psql -h “$db_host” -U postgres -t -c ”

SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int

WHERE pg_is_in_recovery();” 2>/dev/null | tr -d ’ ’)

if [ -z “$lag” ] || [ “$lag” = “NULL” ]; then

echo “0”

else

echo “$lag”

}

Perform DNS failover

perform_dns_failover() {

local from_region=$1

local to_region=$2

local domain=$(jq -r ‘.domain’ “$DR_CONFIG”)

local dns_provider=$(jq -r ‘.dns_provider’ “$DR_CONFIG”)

log “warning” “Initiating DNS failover from $from_region to $to_region”

case “$dns_provider” in

route53)

AWS Route53 failover

local hosted_zone=$(aws route53 list-hosted-zones-by-name \

—query “HostedZones[?Name==’${domain}.’].Id” \

—output text | cut -d’/’ -f3)

local to_endpoint_var=“REGION_${to_region^^}_ENDPOINT”

local to_endpoint=”${!to_endpoint_var}“

Update DNS record

aws route53 change-resource-record-sets \

—hosted-zone-id “$hosted_zone” \

—change-batch ”{

“Changes”: [{

“Action”: “UPSERT”,

“ResourceRecordSet”: {

“Name”: ”${domain}”,

“Type”: “CNAME”,

“TTL”: 60,

“ResourceRecords”: [{“Value”: ”${to_endpoint}”}]

}

}]

}”

;;

cloudflare)

Cloudflare API

local zone_id=$(curl -s -X GET “https://api.cloudflare.com/client/v4/zones?name=$domain” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” | \

jq -r ‘.result[0].id’)

local record_id=$(curl -s -X GET \

“https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$domain” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” | \

jq -r ‘.result[0].id’)

local to_endpoint_var=“REGION_${to_region^^}_ENDPOINT”

local to_endpoint=”${!to_endpoint_var}”

curl -X PUT “https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_id” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” \

-H “Content-Type: application/json” \

—data ”{“type”:“CNAME”,“name”:“$domain”,“content”:“$to_endpoint”,“ttl”:60}”

;;

esac

log “info” “DNS failover initiated, propagation may take up to 60 seconds”

}

Promote standby database to primary

promote_database() {

local region=$1

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_host=”${!db_host_var}”

log “warning” “Promoting database in region $region to primary”

Execute promotion (PostgreSQL example)

ssh “postgres@$db_host” “pg_ctl promote -D /var/lib/postgresql/data”

Update application configuration to point to new primary

local app_servers=($(jq -r “.regions[] | select(.name==“$region”) | .app_servers[]” “$DR_CONFIG”))

for server in ”${app_servers[@]}”; do

ssh “deploy@$server” ”

sed -i ‘s/^DB_HOST=.*/DB_HOST=$db_host/’ /etc/app/config

sudo systemctl restart app

”

done

}

Sync application state

sync_application_state() {

local from_region=$1

local to_region=$2

log “info” “Syncing application state from $from_region to $to_region”

Sync Redis/cache data

local from_redis=$(jq -r “.regions[] | select(.name==“$from_region”) | .redis.host” “$DR_CONFIG”)

local to_redis=$(jq -r “.regions[] | select(.name==“$to_region”) | .redis.host” “$DR_CONFIG”)

Create Redis dump and restore

ssh “redis@$from_redis” “redis-cli BGSAVE && sleep 2”

ssh “redis@$from_redis” “cat /var/lib/redis/dump.rdb” | \

ssh “redis@$to_redis” “cat > /var/lib/redis/dump.rdb && redis-cli FLUSHALL && redis-server —loadmodule”

Sync session data if using file-based sessions

local from_servers=($(jq -r “.regions[] | select(.name==“$from_region”) | .app_servers[]” “$DR_CONFIG”))

local to_servers=($(jq -r “.regions[] | select(.name==“$to_region”) | .app_servers[]” “$DR_CONFIG”))

Use parallel for efficiency

printf ‘%s\n’ ”${from_servers[@]}” | parallel -j 4 ”

rsync -av

Bash Shell Scripting Interview Questions And Answers

Whether you’re just starting out or have years of experience, bash shell interview questions test your understanding of the shell’s unique features, syntax quirks, and best practices that make bash the go-to choice for system automation. We’ll cover the essential bash-specific concepts that interviewers love to explore - from array manipulation and string operations to process substitution and advanced parameter expansion. These questions focus on bash’s distinctive capabilities that set it apart from basic POSIX shell scripting.

Q1: What’s the difference between [ ] and [[ ]] in bash?

This is a classic that trips up many people. The single bracket [ ] is actually a command (same as test), while double brackets [[ ]] are a bash keyword with enhanced functionality:

[[ ]] supports pattern matching: [[ $string == pa* ]] works, but [ $string == pa* ] doesn’t

[[ ]] handles empty variables better: [[ -z $empty ]] works fine, while [ -z $empty ] needs quotes

[[ ]] supports regex matching with =~ operator

[[ ]] allows && and || inside: [[ $a -eq 1 && $b -eq 2 ]]

The key insight: always use [[ ]] in bash unless you need POSIX compatibility. It’s safer and more powerful.

Q2: Explain bash arrays and how they differ from regular variables

Bash supports both indexed and associative arrays, which many people don’t fully utilize:

Indexed arrays:

arr=(apple banana cherry)

echo ${arr[0]} # apple

echo ${arr[@]} # all elements

echo ${#arr[@]} # array length (3)

Associative arrays (bash 4+):

declare -A hash

hash[name]=“John”

hash[age]=30

echo ${hash[name]} # John

The tricky part is that bash arrays are sparse - you can have arr[0] and arr[10] without arr[1-9]. Also, array indices start at 0, not 1 like in some shells.

Q3: How does command substitution work and what’s the difference between “ and $()?

Both capture command output, but $() is strongly preferred:

$() nests easily: echo $(echo $(date))

Backticks require escaping for nesting: echo echo \date\ “

$() is clearer to read, especially with syntax highlighting

$() handles quotes more predictably

Example showing the difference:

This works fine

result=$(echo “hello $(echo “world”)“)

This is a nightmare with backticks

result=echo "hello \echo “world”`”`

Q4: What are bash parameter expansions and give some advanced examples?

Parameter expansion is bash’s Swiss Army knife for string manipulation:

${var:-default} - Use default if var is unset/null

${var:=default} - Set var to default if unset/null

${var:?error} - Exit with error if var is unset/null

${var:+alternate} - Use alternate if var is set

Advanced string manipulation:

path=“/home/user/documents/file.txt”

echo ${path##*/} # file.txt (basename)

echo ${path%/*} # /home/user/documents (dirname)

echo ${path/documents/archive} # String replacement

echo ${path^^} # Convert to uppercase

echo ${path,,} # Convert to lowercase

The pattern: # removes from beginning, % from end. Double it (##, %%) for greedy matching.

Q5: Explain the different types of expansions in bash and their order

Bash performs expansions in a specific order, and understanding this prevents many bugs:

Brace expansion: {a,b,c} or {1..10}

Tilde expansion: ~ becomes home directory

Parameter/variable expansion: $var or ${var}

Command substitution: $(command) or command

Arithmetic expansion: $((2 + 2))

Word splitting: Based on IFS

Pathname expansion: *.txt (globbing)

Why this matters:

This doesn’t work as expected

var=“*.txt”

ls $var # Word splitting happens after parameter expansion

This works

ls *.txt # Pathname expansion happens directly

Q6: What is process substitution and when would you use it?

Process substitution <(command) creates a temporary file descriptor that acts like a file:

Compare outputs of two commands

diff <(ls dir1) <(ls dir2)

Read from multiple commands simultaneously

while read -u 3 line1 && read -u 4 line2; do

echo “File1: $line1 | File2: $line2”

done 3< <(cat file1) 4< <(cat file2)

Use with commands that only accept files

grep “pattern” <(curl -s https://example.com)

It’s incredibly useful when you need to treat command output as a file without creating temporary files. The key limitation: it’s not POSIX, so it’s bash/zsh specific.

Q7: How do you handle signals and traps in bash?

Traps let you intercept signals and run cleanup code:

Basic trap syntax

trap ‘echo “Ctrl+C pressed”’ INT

trap ‘cleanup_function’ EXIT

trap ‘echo “Error on line $LINENO”’ ERR

Remove trap

trap - INT

List all signals

trap -l

Useful pattern: cleanup on any exit

cleanup() {

rm -f /tmp/tempfile.$$

echo “Cleaned up”

}

trap cleanup EXIT INT TERM

Pro tip: Always use single quotes in trap commands to prevent premature expansion. The ERR trap is especially useful with set -e for debugging.

Q8: Explain bash’s set options and their impact on script behavior

The set command changes bash’s behavior fundamentally:

set -e (errexit): Exit on any command failure

set -u (nounset): Exit on undefined variable

set -o pipefail: Pipe fails if any command fails

set -x (xtrace): Print commands before execution

Best practice combo:

set -euo pipefail # Strict mode

set -E # ERR trap inherits to functions

set -T # DEBUG/RETURN traps inherit to functions

You can also use set +e to temporarily disable, and $- contains current options. The gotcha: set -e doesn’t work in certain contexts like if conditions.

Q9: What are coprocesses in bash and how do you use them?

Coprocesses (bash 4+) let you run background processes with two-way communication:

Start a coprocess

coproc BC { bc; }

Send data to coprocess

echo “2+2” >&${BC[1]}

Read result

read result <&${BC[0]}

echo $result # 4

More complex example with named coprocess

coproc GREP { grep —line-buffered “error”; }

echo “this is fine” >&${GREP[1]}

echo “error occurred” >&${GREP[1]}

read -t 1 line <&${GREP[0]} && echo “Got: $line”

Coprocesses are useful for maintaining persistent connections to programs like bc, awk, or database clients without the overhead of starting new processes.

Q10: How does bash handle arithmetic and what are the different ways to perform calculations?

Bash offers multiple arithmetic methods, each with different capabilities:

Arithmetic expansion - integers only

result=$((5 + 3))

((count++)) # C-style increment

let command

let “result = 5 + 3”

expr command (external, avoid)

result=$(expr 5 + 3)

bc for floating point

result=$(echo “scale=2; 5/3” | bc)

Arithmetic comparison in conditions

if ((5 > 3)); then echo “yes”; fi

C-style for loop

for ((i=0; i<10; i++)); do

echo $i

done

Bash only supports integer arithmetic natively. For floating-point, use bc or awk. The ((…)) construct returns 0 for true, opposite of normal bash commands.

Q11: What’s the difference between source, ., and executing a script?

These three methods have crucial differences:

./script.sh - Runs in a subshell, can’t modify parent environment

source script.sh or . script.sh - Runs in current shell, can modify environment

exec script.sh - Replaces current shell process entirely

Example showing the difference:

In script.sh:

export MY_VAR=“hello”

Method 1: Won’t see MY_VAR after

./script.sh

echo $MY_VAR # Empty

Method 2: Will see MY_VAR

source script.sh

echo $MY_VAR # hello

Method 3: Never returns to original shell

exec script.sh

echo “Never printed”

Use source for configuration files, ./ for independent scripts, and exec for wrapper scripts.

Q12: How do you properly handle spaces and special characters in bash?

This is where most scripts break. Key strategies:

Always quote variables

bad: rm $file

good: rm “$file”

Use arrays for multiple items

files=(“file 1.txt” “file 2.txt”)

for f in ”${files[@]}”; do

cat “$f”

done

Read with proper IFS handling

while IFS= read -r line; do

echo “$line”

done < file.txt

Handle filenames with newlines

find . -name “*.txt” -print0 | while IFS= read -r -d ” file; do

echo “Processing: $file”

done

Safe command construction

cmd=(ls -la “/path with spaces/”)

”${cmd[@]}” # Executes correctly

The golden rule: If it’s not a numeric value you’re absolutely sure about, quote it. Use ”$@” not $* for arguments, and always use -r with read to prevent backslash interpretation.