Skip to content

notes

What is Shell Scripting

Shell scripting is basically writing a series of commands in a text file that your computer’s terminal can run automatically. Think of it like creating a recipe where, instead of cooking steps, you’re telling your computer what tasks to do in order. It’s super handy for automating repetitive stuff like backing up files, managing system tasks, or running multiple programs with just one command. Most people use Bash shell on Linux/Mac, but there’s also PowerShell on Windows - either way, it saves tons of time once you get the hang of it.

How to prepare for a Shell Scripting Interview

Get your basics rock solid - Make sure you’re comfortable with fundamental commands like ls, grep, sed, awk, and pipes. Most interview questions for shell scripting start with these building blocks, so practice combining them in different ways.

Practice writing actual scripts - Don’t just memorize syntax. Write scripts for real tasks like file backups, log parsing, or system monitoring. Interviewers love seeing practical problem-solving skills.

Master the common shell scripting interview questions - You’ll definitely get asked about variables, loops, conditionals, and functions. Practice explaining the difference between $@ and $*, when to use double vs single quotes, and how exit codes work.

Debug like crazy - Learn to use set -x for debugging and understand error handling with trap commands. Being able to troubleshoot broken scripts on the spot is a huge plus.

Know your shell differences - Understand what makes bash different from sh, ksh, or zsh. Some companies are picky about POSIX compliance.

Study real-world scenarios - Look up common automation tasks like monitoring disk space, rotating logs, or deploying applications. These make great discussion points during interviews.

Practice explaining your code - The best script in the world won’t help if you can’t walk through your logic clearly. Practice talking through your problem-solving approach out loud.

Shell Scripting Interview Questions and Answers For Beginners

Starting your journey with shell scripting interviews? Don’t worry - most bash scripting interview questions for beginners focus on practical basics rather than complex theory. We’ll walk through the most common bash shell interview questions that entry-level positions typically cover, helping you understand not just what to answer, but why things work the way they do. These questions come up constantly because they test whether you can actually write useful scripts, not just memorize commands.

Q1: What exactly is a shell script?

Think of it as a text file filled with commands you’d normally type one by one in the terminal. Instead of typing them manually every time, you save them in a file and run them all at once. It’s like creating a macro for your terminal - super helpful when you need to do the same tasks repeatedly.

Q2: How do I create my first shell script?

Open any text editor (nano, vim, or even notepad), type your commands, and save it with a .sh extension. Here’s the simplest example:

bash

#!/bin/bash

echo “My first script!”

Save it as myscript.sh, make it executable with chmod +x myscript.sh, then run with ./myscript.sh.

Q3: Why do some variables have $ and others don’t?

You use $ when you want to get the value from a variable, but not when you’re setting it. Setting: name=“John” (no $). Using: echo “Hello $name” (with $). It’s like the difference between writing on a nametag versus reading what’s already written there.

Q4: What are those -f, -d, -e things I see in if statements?

These are test operators that check file conditions. -f checks if something is a regular file, -d checks for directories, -e checks if anything exists at that path. There are tons more: -r for readable, -w for writable, -x for executable. They’re your script’s way of looking before it leaps.

Q5: How do I do basic math in bash?

Bash is quirky with math - you need special syntax. Use $((expression)) for arithmetic: result=$((5 + 3)). For decimals, bash can’t handle them natively - you’d need bc: echo “5.5 + 2.3” | bc. Remember, spaces don’t matter inside $(( )), which is unusual for bash.

Q6: What’s this 2>&1 thing I keep seeing?

This redirects error messages to wherever regular output is going. The 2 represents stderr (error messages), 1 represents stdout (normal output), and & tells bash “I mean the file descriptor, not a file named 1”. So command > output.txt 2>&1 sends both regular output and errors to output.txt.

Q7: How do I check if a command succeeded?

Every command returns an exit code - 0 for success, anything else for failure. Check it right after the command:

bash

cp file1 file2

if [ $? -eq 0 ]; then

echo “Copy worked!”

else

echo “Copy failed!”

fi

Q8: What’s the deal with spaces in bash?

Bash is super picky about spaces, especially in conditions. if [ $a = $b ] needs spaces around the brackets and operators. But a=5 can’t have spaces around the =. It’s annoying at first, but you’ll develop muscle memory for it pretty quickly.

Q9: How do I loop through a list of items?

The for loop is your friend here. For a simple list:

bash

for fruit in apple banana orange; do

echo “I like $fruit”

done

For files: for file in *.txt; do something; done. The loop automatically splits on spaces unless you mess with IFS.

Q10: Can I use variables from outside my script?

Yes! These are environment variables. Your script can read system variables like $HOME, $USER, $PATH. To pass your own, either export them first (export MYVAR=“value”) or set them inline: MYVAR=“value” ./myscript.sh. Just remember they’re readable, not writable - changes inside the script won’t affect the outside.

Q11: What happens if my script crashes midway?

By default, bash keeps going even if commands fail. To make it stop on errors, add set -e near the top. Want to see what’s happening? Add set -x for debug mode. Combine them as set -ex. Just remember these make your script stricter - sometimes you want to handle errors yourself instead.

Q12: How do I get input from users?

The read command is what you need. Basic usage: read username waits for input and stores it. Fancy it up with read -p “What’s your name? ” username for a prompt. Want a password? Use read -s to hide what they type. Set a timeout with read -t 10 to wait only 10 seconds.

Shell Scripting Coding Interview Questions and Answers For Beginners

Ready to tackle actual shell scripting coding interview questions? These hands-on problems test whether you can write working scripts, not just talk about concepts. We’ll cover the classic coding challenges that beginners face in interviews - the ones where you need to write real bash code to solve practical problems. Each solution includes the full script with explanations, so you can understand the logic and adapt it for similar questions.

Q1: Write a script to check if a number is even or odd

This tests your understanding of arithmetic operations and conditionals:

bash

#!/bin/bash

echo “Enter a number: ”

read num

if ! [[ “$num” =~ ^[0-9]+$ ]]; then

echo “Please enter a valid number”

exit 1

fi

if [ $((num % 2)) -eq 0 ]; then

echo “$num is even”

else

echo “$num is odd”

fi

The key here is using the modulo operator (%) inside $(( )). If a number divided by 2 has no remainder, it’s even. The regex check prevents the script from breaking on non-numeric input.

Q2: Create a script that counts files in a directory by extension

This problem tests file handling and associative arrays:

bash

#!/bin/bash

if [ $# -eq 0 ]; then

echo “Usage: $0 <directory>”

exit 1

fi

directory=“$1”

if [ ! -d “$directory” ]; then

echo “Directory $directory doesn’t exist”

exit 1

fi

echo “File count by extension in $directory:”

for file in “$directory”/*; do

[ -e “$file” ] || continue

[ -f “$file” ] || continue

extension=”${file##*.}”

if [ “$extension” = “$file” ]; then

extension=“no_extension”

fi

Count using associative array would be ideal, but

Section titled “Count using associative array would be ideal, but”

for beginners, let’s use a simpler approach

Section titled “for beginners, let’s use a simpler approach”

echo “$extension”

done | sort | uniq -c | sort -rn

This solution uses parameter expansion (${file##*.}) to extract extensions and pipes the output through sort and uniq to count occurrences. It’s a common pattern in shell scripting - using Unix tools together.

Q3: Write a script to backup files modified in the last 7 days

This tests your knowledge of the find command and date handling:

bash

#!/bin/bash

source_dir=”${1:-.}”  # Use current directory if not specified

backup_dir=“backup_$(date +%Y%m%d)“

mkdir -p “$backup_dir”

echo “Starting backup of files modified in last 7 days…”

Find and copy files modified in last 7 days

Section titled “Find and copy files modified in last 7 days”

count=0

find “$source_dir” -type f -mtime -7 | while read -r file; do

[[ “$file” == “$backup_dir” ]] && continue

rel_path=”${file#$source_dir/}”

backup_path=“$backup_dir/$rel_path”

backup_subdir=”$(dirname “$backup_path”)”

mkdir -p “$backup_subdir”

if cp -p “$file” “$backup_path” 2>/dev/null; then

((count++))

echo “Backed up: $rel_path”

fi

done

echo “Backup complete! $count files copied to $backup_dir”

The script uses find with -mtime -7 to locate recent files. The while read loop processes filenames with spaces correctly. We preserve the directory structure in the backup and use cp -p to keep timestamps.

Q4: Create a script that monitors disk usage and alerts when it exceeds 80%

This demonstrates system monitoring and parsing command output:

bash

#!/bin/bash

threshold=80

echo “Checking disk usage…”

echo ”========================“

df -h | grep -E ’^/dev/’ | while read -r line; do

device=$(echo “$line” | awk ‘{print $1}’)

usage=$(echo “$line” | awk ‘{print $5}’ | sed ‘s/%//’)

mount=$(echo “$line” | awk ‘{print $6}‘)

if [ “$usage” -gt “$threshold” ]; then

echo “WARNING: $mount ($device) is at $usage% capacity!”

else

echo “OK: $mount ($device) is at $usage% capacity”

fi

done

Also check if any partition is critically full (>90%)

Section titled “Also check if any partition is critically full (>90%)”

critical=$(df -h | grep -E ’^/dev/’ | awk ‘{print $5}’ | sed ‘s/%//’ | sort -n | tail -1)

if [ “$critical” -gt 90 ]; then

echo ""

echo “CRITICAL: At least one partition is over 90% full!”

exit 1

fi

This script parses df output using awk to extract specific fields. The percentage sign is stripped with sed so we can do numeric comparison. Real-world scripts might send emails or write to log files.

Q5: Write a script that validates and processes a CSV file

This tests file parsing and error handling:

bash

#!/bin/bash

if [ $# -eq 0 ]; then

echo “Usage: $0 <csv_file>”

exit 1

fi

csv_file=“$1”

if [ ! -r “$csv_file” ]; then

echo “Error: Cannot read file $csv_file”

exit 1

fi

echo “Processing CSV file: $csv_file”

echo ”================================“

header=$(head -1 “$csv_file”)

num_fields=$(echo “$header” | awk -F’,’ ‘{print NF}’)

echo “Header fields ($num_fields): $header”

echo ""

line_num=1

error_count=0

valid_count=0

tail -n +2 “$csv_file” | while IFS=’,’ read -r field1 field2 field3 remainder; do

((line_num++))

Check if line has correct number of fields

Section titled “Check if line has correct number of fields”

current_fields=$(echo “$field1,$field2,$field3,$remainder” | awk -F’,’ ‘{print NF}’)

if [ “$current_fields” -ne “$num_fields” ]; then

echo “Error on line $line_num: Expected $num_fields fields, got $current_fields”

((error_count++))

continue

fi

Basic validation - check if fields are not empty

Section titled “Basic validation - check if fields are not empty”

if [ -z “$field1” ] || [ -z “$field2” ] || [ -z “$field3” ]; then

echo “Error on line $line_num: Empty required fields”

((error_count++))

continue

fi

Process valid line (example: just echo it)

Section titled “Process valid line (example: just echo it)”

echo “Processing: $field1 | $field2 | $field3”

((valid_count++))

done

echo ""

echo “Summary: Processed $valid_count valid lines, found $error_count errors”

This script shows proper CSV handling using IFS (Internal Field Separator) and read. It validates the number of fields per line and checks for empty values. The tail -n +2 skips the header for processing.

Shell Scripting Interview Questions and Answers For Intermediate (2-4 years Exp).

Moving beyond the basics, intermediate shell scripting coding interview questions focus on real-world problem solving, performance optimization, and handling complex scenarios you’ve likely encountered in production. We’ll explore questions that test your experience with error handling, process management, and writing maintainable scripts that other developers can understand and modify. These questions reflect the challenges you face when scripts move from personal tools to team resources that need to be reliable and efficient.

Q1: How do you handle errors gracefully in production scripts?

At this level, you need multiple strategies working together. I use trap to catch errors and cleanup, set -euo pipefail to stop on failures, and custom error functions:

#!/bin/bash

set -euo pipefail

error_exit() {

echo “Error on line $1: $2” >&2

cleanup

exit 1

}

cleanup() {

rm -f /tmp/mylockfile

echo “Cleanup completed”

}

trap ‘error_exit $LINENO “Command failed”’ ERR

trap cleanup EXIT

The key is combining these techniques - set -e stops on errors, -u catches undefined variables, -o pipefail catches errors in pipes, and trap ensures cleanup happens no matter what.

Q2: Explain process substitution and when you’d use it

Process substitution <(command) creates a temporary file descriptor that acts like a file. It’s brilliant for comparing outputs or feeding multiple processes:

diff <(ls dir1) <(ls dir2)

Feed sorted data to a command expecting a file

Section titled “Feed sorted data to a command expecting a file”

join <(sort file1) <(sort file2)

paste <(cut -d: -f1 /etc/passwd) <(cut -d: -f3 /etc/passwd)

I use it when I need to avoid temporary files or when working with commands that only accept file arguments but I want to give them dynamic data.

Q3: How do you implement proper logging in shell scripts?

Production scripts need structured logging with timestamps, levels, and rotation. Here’s my go-to pattern:

LOG_FILE=“/var/log/myscript.log”

LOG_LEVEL=${LOG_LEVEL:-“INFO”}  # Can override via environment

log() {

local level=$1

shift

local message=”$@”

local timestamp=$(date ’+%Y-%m-%d %H:%M:%S’)

case $LOG_LEVEL in

ERROR) [[ $level =~ ^(ERROR)$ ]] || return ;;

WARN)  [[ $level =~ ^(ERROR|WARN)$ ]] || return ;;

INFO)  [[ $level =~ ^(ERROR|WARN|INFO)$ ]] || return ;;

DEBUG) ;; # Log everything

esac

echo ”[$timestamp] [$level] $message” | tee -a “$LOG_FILE”

}

log INFO “Script started”

log ERROR “Connection failed”

log DEBUG “Variable X = $x”

Don’t forget log rotation - either use logrotate or implement size checking in your script.

Q4: How do you handle concurrent script execution?

Preventing race conditions is crucial. I typically use flock for reliable locking:

LOCK_FILE=“/var/run/myscript.lock”

exec 200>“$LOCK_FILE”

if ! flock -n 200; then

echo “Another instance is running. Exiting.”

exit 1

fi

if ! flock -w 10 200; then

echo “Could not acquire lock after 10 seconds”

exit 1

fi

For more complex scenarios, I might use mkdir as an atomic operation or implement a PID file system that checks if the process is actually still running.

Q5: What’s your approach to parsing complex command line arguments?

For production scripts, I use getopts for short options and manual parsing for long options:

usage() {

cat << EOF

Usage: $0 [-h] [-v] [-f FILE] [—debug] [—config CONFIG]

Options:

-h, —help      Show this help

-v, —verbose   Enable verbose mode

-f FILE         Input file

—debug         Enable debug mode

—config FILE   Configuration file

EOF

}

VERBOSE=0

DEBUG=0

while [[ $# -gt 0 ]]; do

case $1 in

-h|—help)

usage

exit 0

;;

-v|—verbose)

VERBOSE=1

shift

;;

-f)

INPUT_FILE=“$2”

shift 2

;;

—debug)

DEBUG=1

set -x  # Enable bash debugging

shift

;;

—config)

CONFIG_FILE=“$2”

shift 2

;;

—)

shift

break

;;

-*)

echo “Unknown option: $1” >&2

usage

exit 1

;;

*)

break

;;

esac

done

Q6: How do you optimize scripts that process large files?

Never load entire files into memory. Instead, stream process them:

for line in $(cat hugefile.txt); do

process “$line”

done

while IFS= read -r line; do

process “$line”

done < hugefile.txt

Better: parallel processing for CPU-bound tasks

Section titled “Better: parallel processing for CPU-bound tasks”

cat hugefile.txt | parallel -j 4 process {}

For simple transformations, use tools designed for it

Section titled “For simple transformations, use tools designed for it”

awk ‘{sum += $3} END {print sum}’ hugefile.txt  # Sum third column

sed -i ‘s/old/new/g’ hugefile.txt  # In-place replacement

Also consider splitting large files and processing chunks in parallel, or using tools like split, sort -S for memory limits, and join instead of loading everything into associative arrays.

Q7: Explain your strategy for making scripts portable across different systems

Portability is tricky. I start with POSIX compliance when possible, but pragmatically handle differences:

#!/usr/bin/env bash  # More portable than /bin/bash

if [[ “$OSTYPE” == “linux-gnu”* ]]; then

OS=“linux”

elif [[ “$OSTYPE” == “darwin”* ]]; then

OS=“macos”

elif [[ “$OSTYPE” == “freebsd”* ]]; then

OS=“freebsd”

fi

if command -v gsed &> /dev/null; then

SED=gsed  # GNU sed on Mac

else

SED=sed

fi

for cmd in awk grep “$SED”; do

if ! command -v “$cmd” &> /dev/null; then

echo “Required command ‘$cmd’ not found” >&2

exit 1

fi

done

find . -type f -name “*.txt” -print0 | xargs -0 grep “pattern”  # Works everywhere

Q8: How do you implement timeout functionality for long-running commands?

The timeout command is great when available, but here’s a portable solution:

if command -v timeout &> /dev/null; then

timeout 30s long_running_command

else

long_running_command &

pid=$!

count=0

while kill -0 $pid 2>/dev/null && [ $count -lt 30 ]; do

sleep 1

((count++))

done

if kill -0 $pid 2>/dev/null; then

echo “Command timed out after 30 seconds”

kill -TERM $pid

sleep 2

kill -0 $pid 2>/dev/null && kill -KILL $pid

fi

fi

Alternative: using read with timeout for user input

Section titled “Alternative: using read with timeout for user input”

if read -t 10 -p “Enter value (10 sec timeout): ” value; then

echo “You entered: $value”

else

echo “Timeout or cancelled”

fi

Q9: What’s your approach to debugging complex shell scripts?

Beyond set -x, I use structured debugging:

DEBUG_LEVEL=${DEBUG_LEVEL:-0}

debug() {

local level=$1

shift

if [ “$DEBUG_LEVEL” -ge “$level” ]; then

echo “[DEBUG$level] $*” >&2

fi

}

trace() {

local func=$1

shift

debug 1 “Entering $func with args: $*”

local start=$(date +%s.%N)

“$func” ”$@”

local ret=$?

local end=$(date +%s.%N)

local duration=$(echo “$end - $start” | bc)

debug 1 “Exiting $func (duration: ${duration}s, return: $ret)”

return $ret

}

export PS4=’+(${BASH_SOURCE}:${LINENO}): ${FUNCNAME[0]:+${FUNCNAME[0]}(): }‘

[ “$DEBUG” = “1” ] && set -x

I also use ShellCheck religiously, and for really tough bugs, I’ll use bashdb or add extensive logging.

Q10: How do you manage configuration in shell scripts?

I layer configurations from multiple sources with clear precedence:

declare -A CONFIG=(

[db_host]=“localhost”

[db_port]=“5432”

[log_level]=“INFO”

)

[ -f /etc/myapp/config ] && source /etc/myapp/config

[ -f ~/.myapp/config ] && source ~/.myapp/config

[ -f ./config ] && source ./config

[ -n “$MYAPP_DB_HOST” ] && CONFIG[db_host]=$MYAPP_DB_HOST

[ -n “$MYAPP_DB_PORT” ] && CONFIG[db_port]=$MYAPP_DB_PORT

for key in db_host db_port; do

if [ -z ”${CONFIG[$key]}” ]; then

echo “Error: Missing required config: $key” >&2

exit 1

fi

done

export MYAPP_CONFIG_DB_HOST=”${CONFIG[db_host]}”

This gives flexibility while maintaining predictable behavior - defaults, system-wide settings, user preferences, project overrides, and finally environment variables.

Shell Scripting Coding Interview Questions and Answers For Intermediate (2-4 years Exp)

At the intermediate level, shell scripting coding interview questions shift from basic syntax to solving real production challenges - the kind where your script needs to handle thousands of files, recover from failures, and play nice with other systems. We’ll tackle the practical problems that someone with a few years under their belt should handle confidently, focusing on efficiency, reliability, and maintainability. These are the scenarios where your experience shows through in how you structure your solution, not just whether it works.

Q1: Write a script that monitors a log file in real-time and sends alerts for specific error patterns

This tests your ability to handle continuous file monitoring and pattern matching:

#!/bin/bash

LOG_FILE=”${1:-/var/log/application.log}”

ALERT_EMAIL=“admin@company.com

ERROR_PATTERNS=(“ERROR” “CRITICAL” “FATAL” “OutOfMemory”)

ALERT_THRESHOLD=5  # Alert after 5 errors in 60 seconds

WINDOW_SIZE=60

if [ ! -f “$LOG_FILE” ]; then

echo “Error: Log file $LOG_FILE not found”

exit 1

fi

declare -A error_counts

declare -A error_timestamps

send_alert() {

local pattern=$1

local count=$2

local message=“Alert: $count occurrences of ‘$pattern’ in last $WINDOW_SIZE seconds”

In real scenario, you’d send email or post to Slack

Section titled “In real scenario, you’d send email or post to Slack”

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] $message”

echo “$message” | mail -s “Log Alert: $pattern” $ALERT_EMAIL

Section titled “echo “$message” | mail -s “Log Alert: $pattern” $ALERT_EMAIL”

error_counts[$pattern]=0

error_timestamps[$pattern]=""

}

clean_old_timestamps() {

local pattern=$1

local current_time=$(date +%s)

local new_timestamps=""

for ts in ${error_timestamps[$pattern]}; do

if [ $((current_time - ts)) -lt $WINDOW_SIZE ]; then

new_timestamps=“$new_timestamps $ts”

fi

done

error_timestamps[$pattern]=“$new_timestamps”

error_counts[$pattern]=$(echo “$new_timestamps” | wc -w)

}

echo “Monitoring $LOG_FILE for error patterns…”

echo “Patterns: ${ERROR_PATTERNS[*]}”

echo “Alert threshold: $ALERT_THRESHOLD errors in $WINDOW_SIZE seconds”

echo “Press Ctrl+C to stop”

tail -F “$LOG_FILE” | while read -r line; do

for pattern in ”${ERROR_PATTERNS[@]}”; do

if [[ “$line” =~ $pattern ]]; then

current_time=$(date +%s)

error_timestamps[$pattern]=”${error_timestamps[$pattern]} $current_time”

clean_old_timestamps “$pattern”

if [ ”${error_counts[$pattern]}” -ge “$ALERT_THRESHOLD” ]; then

send_alert “$pattern” ”${error_counts[$pattern]}”

fi

echo ”[$(date ’+%H:%M:%S’)] Found: $pattern (count: ${error_counts[$pattern]})”

fi

done

done

The script uses tail -F (capital F) to follow log rotation, tracks errors within a time window, and only alerts when thresholds are exceeded. This prevents alert spam while catching real issues.

Q2: Create a script that performs parallel backup of multiple directories with progress tracking

This demonstrates process management and parallel execution:

#!/bin/bash

SOURCE_DIRS=(

“/home/user/documents”

“/home/user/projects”

“/var/www/html”

“/etc”

)

BACKUP_ROOT=“/backup/$(date +%Y%m%d_%H%M%S)”

MAX_PARALLEL=3

COMPRESSION=“gzip”  # or “bzip2”, “xz”

mkdir -p “$BACKUP_ROOT”

PROGRESS_FILE=“/tmp/backup_progress_$$”

rm -f “$PROGRESS_FILE”

touch “$PROGRESS_FILE”

backup_directory() {

local src_dir=$1

local dir_name=$(basename “$src_dir”)

local backup_file=“$BACKUP_ROOT/${dir_name}.tar.gz”

local pid=$$

local start_time=$(date +%s)

echo “$pid:$dir_name:0:STARTING” >> “$PROGRESS_FILE”

local total_size=$(du -sb “$src_dir” 2>/dev/null | awk ‘{print $1}‘)

tar cf - “$src_dir” 2>/dev/null | \

pv -s “$total_size” -n 2>/tmp/pv_progress_$$ | \

gzip > “$backup_file” &

local tar_pid=$!

while kill -0 $tar_pid 2>/dev/null; do

if [ -f /tmp/pv_progress_$$ ]; then

progress=$(tail -1 /tmp/pv_progress_$$ 2>/dev/null || echo “0”)

echo “$pid:$dir_name:$progress:RUNNING” >> “$PROGRESS_FILE”

fi

sleep 1

done

wait $tar_pid

local exit_code=$?

local end_time=$(date +%s)

local duration=$((end_time - start_time))

if [ $exit_code -eq 0 ]; then

local final_size=$(stat -f%z “$backup_file” 2>/dev/null || stat -c%s “$backup_file”)

echo “$pid:$dir_name:100:COMPLETED:$duration:$final_size” >> “$PROGRESS_FILE”

else

echo “$pid:$dir_name:0:FAILED:$duration:0” >> “$PROGRESS_FILE”

fi

rm -f /tmp/pv_progress_$$

}

show_progress() {

while true; do

clear

echo “Backup Progress - $(date ’+%Y-%m-%d %H:%M:%S’)”

echo ”==================================================“

while IFS=’:’ read -r pid dir progress status duration size; do

case $status in

STARTING)

printf ”%-30s [%s]\n” “$dir” “Initializing…”

;;

RUNNING)

bar_length=30

filled=$((progress * bar_length / 100))

bar=$(printf ’%*s’ “$filled” | tr ’ ’ ’=’)

empty=$((bar_length - filled))

bar=”${bar}$(printf ’%*s’ “$empty” | tr ’ ’ ’-’)”

printf ”%-30s [%s] %3d%%\n” “$dir” “$bar” “$progress”

;;

COMPLETED)

size_mb=$((size / 1024 / 1024))

printf ”%-30s [DONE] %dMB in %ds\n” “$dir” “$size_mb” “$duration”

;;

FAILED)

printf ”%-30s [FAILED]\n” “$dir”

;;

esac

done < “$PROGRESS_FILE”

if ! grep -q “RUNNING|STARTING” “$PROGRESS_FILE” 2>/dev/null; then

echo ""

echo “All backups completed!”

break

fi

sleep 1

done

}

show_progress &

progress_pid=$!

job_count=0

for dir in ”${SOURCE_DIRS[@]}”; do

while [ $(jobs -r | wc -l) -ge $MAX_PARALLEL ]; do

sleep 0.5

done

backup_directory “$dir” &

((job_count++))

done

wait

kill $progress_pid 2>/dev/null

show_progress  # Show final status

rm -f “$PROGRESS_FILE”

echo ""

echo “Backup Summary”

echo ”==============”

echo “Location: $BACKUP_ROOT”

echo “Total archives: $(ls -1 “$BACKUP_ROOT”/*.tar.gz 2>/dev/null | wc -l)”

echo “Total size: $(du -sh “$BACKUP_ROOT” | awk ‘{print $1}’)“

ls -lh “$BACKUP_ROOT”/*.tar.gz 2>/dev/null

This script showcases parallel job management, real-time progress tracking with pv, and proper cleanup. It handles multiple directories simultaneously while keeping the user informed.

Q3: Write a script that synchronizes configuration files across multiple servers

This tests your understanding of remote execution and error handling:

#!/bin/bash

CONFIG_DIR=“/etc/myapp”

SERVERS=(“web01.example.com” “web02.example.com” “web03.example.com”)

SSH_USER=“deploy”

SSH_KEY=“$HOME/.ssh/deploy_key”

BACKUP_BEFORE_SYNC=true

DRY_RUN=false

while getopts “dnh” opt; do

case $opt in

d) DRY_RUN=true ;;

n) BACKUP_BEFORE_SYNC=false ;;

h) echo “Usage: $0 [-d] [-n] [-h]”

echo ”  -d  Dry run (show what would be done)”

echo ”  -n  No backup before sync”

exit 0 ;;

esac

done

RED=‘\033[0;31m’

GREEN=‘\033[0;32m’

YELLOW=‘\033[1;33m’

NC=‘\033[0m’

log() {

local level=$1

shift

local msg=”$@”

local timestamp=$(date ’+%Y-%m-%d %H:%M:%S’)

case $level in

ERROR) echo -e ”${RED}[$timestamp] ERROR: $msg${NC}” >&2 ;;

SUCCESS) echo -e ”${GREEN}[$timestamp] SUCCESS: $msg${NC}” ;;

INFO) echo -e ”[$timestamp] INFO: $msg” ;;

WARN) echo -e ”${YELLOW}[$timestamp] WARN: $msg${NC}” ;;

esac

echo ”[$timestamp] $level: $msg” >> “/var/log/config_sync.log”

}

check_server() {

local server=$1

ssh -q -o ConnectTimeout=5 -o BatchMode=yes \

-i “$SSH_KEY” “$SSH_USER@$server” exit 2>/dev/null

}

Function to backup configs on remote server

Section titled “Function to backup configs on remote server”

backup_remote_configs() {

local server=$1

local backup_name=“config_backup_$(date +%Y%m%d_%H%M%S).tar.gz”

log INFO “Creating backup on $server”

ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

if [ -d ‘$CONFIG_DIR’ ]; then

sudo tar czf /tmp/$backup_name $CONFIG_DIR 2>/dev/null

sudo mv /tmp/$backup_name /var/backups/

echo ‘Backup created: /var/backups/$backup_name’

else

echo ‘Config directory not found, skipping backup’

fi

}

sync_file() {

local server=$1

local file=$2

local rel_path=${file#$CONFIG_DIR/}

local local_sum=$(md5sum “$file” | awk ‘{print $1}’)

local remote_sum=$(ssh -i “$SSH_KEY” “$SSH_USER@$server” \

“sudo md5sum ‘$file’ 2>/dev/null | awk ‘{print $1}’”)

if [ “$local_sum” = “$remote_sum” ]; then

echo ”  ✓ $rel_path (unchanged)”

return 0

fi

if [ “$DRY_RUN” = true ]; then

echo ”  → Would sync: $rel_path”

return 0

fi

if scp -i “$SSH_KEY” “$file” “$SSH_USER@$server:/tmp/$(basename “$file”)” >/dev/null 2>&1; then

if ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

sudo mkdir -p $(dirname “$file”)

sudo mv /tmp/$(basename “$file”) $file

sudo chown root:root $file

sudo chmod 644 $file

”; then

echo ”  ✓ $rel_path (updated)”

return 0

fi

fi

echo ”  ✗ $rel_path (failed)”

return 1

}

log INFO “Starting configuration sync”

[ “$DRY_RUN” = true ] && log WARN “Running in DRY RUN mode”

log INFO “Checking server connectivity…”

available_servers=()

for server in ”${SERVERS[@]}”; do

if check_server “$server”; then

available_servers+=(“$server”)

echo ”  ✓ $server”

else

log ERROR “Cannot connect to $server”

echo ”  ✗ $server”

fi

done

if [ ${#available_servers[@]} -eq 0 ]; then

log ERROR “No servers available for sync”

exit 1

fi

config_files=$(find “$CONFIG_DIR” -type f -name “.conf” -o -name “.yml” -o -name “*.json”)

file_count=$(echo “$config_files” | wc -l)

log INFO “Found $file_count configuration files to sync”

for server in ”${available_servers[@]}”; do

log INFO “Syncing to $server”

if [ “$BACKUP_BEFORE_SYNC” = true ] && [ “$DRY_RUN” = false ]; then

backup_remote_configs “$server”

fi

success_count=0

fail_count=0

while IFS= read -r file; do

if sync_file “$server” “$file”; then

((success_count++))

else

((fail_count++))

fi

done <<< “$config_files”

if [ $fail_count -eq 0 ] && [ “$DRY_RUN” = false ]; then

log INFO “Reloading services on $server”

ssh -i “$SSH_KEY” “$SSH_USER@$server” ”

sudo systemctl reload nginx 2>/dev/null || true

sudo systemctl reload myapp 2>/dev/null || true

fi

if [ $fail_count -eq 0 ]; then

log SUCCESS “$server: $success_count files synced successfully”

else

log ERROR “$server: $fail_count files failed (${success_count} succeeded)”

fi

done

log INFO “Configuration sync completed”

This script handles multiple servers, does checksums to avoid unnecessary transfers, creates backups, and includes proper error handling with dry-run support.

Q4: Create a script that analyzes system performance and generates an HTML report

This shows data collection, processing, and formatted output:

#!/bin/bash

REPORT_FILE=“system_report_$(hostname)$(date +%Y%m%d%H%M%S).html”

TEMP_DIR=“/tmp/sysreport_$$”

mkdir -p “$TEMP_DIR”

CPU_THRESHOLD=80

MEM_THRESHOLD=85

DISK_THRESHOLD=90

get_color() {

local value=$1

local threshold=$2

if [ $(echo “$value >= $threshold” | bc) -eq 1 ]; then

echo “#ff4444”  # Red

elif [ $(echo “$value >= $threshold * 0.8” | bc) -eq 1 ]; then

echo “#ff9944”  # Orange

else

echo “#44ff44”  # Green

fi

}

echo “Collecting system information…”

HOSTNAME=$(hostname)

UPTIME=$(uptime -p)

KERNEL=$(uname -r)

CPU_MODEL=$(grep “model name” /proc/cpuinfo | head -1 | cut -d: -f2)

CPU_CORES=$(nproc)

TOTAL_MEM=$(free -h | awk ’/^Mem:/ {print $2}‘)

echo “Analyzing CPU usage…”

sar -u 1 5 > “$TEMP_DIR/cpu_stats.txt” 2>/dev/null || {

top -b -n 2 -d 5 | grep “Cpu(s)” | tail -1 > “$TEMP_DIR/cpu_stats.txt”

}

CPU_USAGE=$(awk ‘/Average:/ {print 100 - $NF}’ “$TEMP_DIR/cpu_stats.txt” || \

awk ‘{print $2}’ “$TEMP_DIR/cpu_stats.txt” | tr -d ‘%id,’ || echo “0”)

MEM_STATS=$(free -m | awk ’/^Mem:/ {printf “%.1f”, ($3/$2) * 100}‘)

echo “Analyzing disk usage…”

DISK_DATA=""

while read -r line; do

device=$(echo “$line” | awk ‘{print $1}’)

mount=$(echo “$line” | awk ‘{print $6}’)

usage=$(echo “$line” | awk ‘{print $5}’ | tr -d ’%’)

size=$(echo “$line” | awk ‘{print $2}’)

used=$(echo “$line” | awk ‘{print $3}’)

avail=$(echo “$line” | awk ‘{print $4}’)

color=$(get_color “$usage” “$DISK_THRESHOLD”)

DISK_DATA=”${DISK_DATA}

<tr>

<td>$device</td>

<td>$mount</td>

<td>$size</td>

<td>$used</td>

<td>$avail</td>

<td style=‘color: $color; font-weight: bold;’>$usage%</td>

</tr>”

done < <(df -h | grep ’^/dev/’ | grep -v ‘/dev/loop’)

echo “Analyzing processes…”

TOP_CPU=$(ps aux —sort=-%cpu | head -6 | tail -5)

TOP_MEM=$(ps aux —sort=-%mem | head -6 | tail -5)

NETSTAT=$(ss -s 2>/dev/null | grep -E “TCP:|UDP:” | head -4)

RECENT_ERRORS=$(journalctl -p err -n 10 —no-pager 2>/dev/null || \

dmesg | grep -iE “error|fail|warn” | tail -10)

cat > “$REPORT_FILE” << ‘EOF’

<!DOCTYPE html>

<html>

<head>

<title>System Performance Report</title>

<meta charset=“utf-8”>

<style>

body {

font-family: Arial, sans-serif;

margin: 20px;

background-color: #f5f5f5;

}

.container {

max-width: 1200px;

margin: 0 auto;

background-color: white;

padding: 20px;

box-shadow: 0 0 10px rgba(0,0,0,0.1);

}

h1, h2 {

color: #333;

border-bottom: 2px solid #4CAF50;

padding-bottom: 10px;

}

.info-grid {

display: grid;

grid-template-columns: repeat(auto-fit, minmax(250px, 1fr));

gap: 20px;

margin: 20px 0;

}

.info-box {

background-color: #f9f9f9;

padding: 15px;

border-radius: 5px;

border-left: 4px solid #4CAF50;

}

table {

width: 100%;

border-collapse: collapse;

margin: 20px 0;

}

th, td {

text-align: left;

padding: 12px;

border-bottom: 1px solid #ddd;

}

th {

background-color: #4CAF50;

color: white;

}

tr:hover {

background-color: #f5f5f5;

}

.metric {

font-size: 24px;

font-weight: bold;

margin: 10px 0;

}

.warning {

background-color: #fff3cd;

border-color: #ffeaa7;

color: #856404;

padding: 10px;

border-radius: 5px;

margin: 10px 0;

}

.timestamp {

text-align: right;

color: #666;

font-style: italic;

}

pre {

background-color: #f4f4f4;

padding: 10px;

border-radius: 5px;

overflow-x: auto;

}

</style>

</head>

<body>

<div class=“container”>

EOF

cat >> “$REPORT_FILE” << EOF

<h1>System Performance Report - $HOSTNAME</h1>

<p class=“timestamp”>Generated on $(date ’+%Y-%m-%d %H:%M:%S’)</p>

<div class=“info-grid”>

<div class=“info-box”>

<strong>Hostname:</strong> $HOSTNAME<br>

<strong>Uptime:</strong> $UPTIME<br>

<strong>Kernel:</strong> $KERNEL

</div>

<div class=“info-box”>

<strong>CPU Model:</strong> $CPU_MODEL<br>

<strong>CPU Cores:</strong> $CPU_CORES<br>

<strong>Total Memory:</strong> $TOTAL_MEM

</div>

<div class=“info-box”>

<strong>CPU Usage:</strong>

<div class=“metric” style=“color: $(get_color $CPU_USAGE $CPU_THRESHOLD)”>

${CPU_USAGE}%

</div>

</div>

<div class=“info-box”>

<strong>Memory Usage:</strong>

<div class=“metric” style=“color: $(get_color $MEM_STATS $MEM_THRESHOLD)”>

${MEM_STATS}%

</div>

</div>

</div>

EOF

if [ $(echo “$CPU_USAGE >= $CPU_THRESHOLD” | bc) -eq 1 ]; then

echo ‘<div class=“warning”>⚠️ High CPU usage detected!</div>’ >> “$REPORT_FILE”

fi

if [ $(echo “$MEM_STATS >= $MEM_THRESHOLD” | bc) -eq 1 ]; then

echo ‘<div class=“warning”>⚠️ High memory usage detected!</div>’ >> “$REPORT_FILE”

fi

cat >> “$REPORT_FILE” << EOF

<h2>Disk Usage</h2>

<table>

<tr>

<th>Device</th>

<th>Mount Point</th>

<th>Size</th>

<th>Used</th>

<th>Available</th>

<th>Usage %</th>

</tr>

$DISK_DATA

</table>

<h2>Top CPU Consuming Processes</h2>

<pre>$(echo “$TOP_CPU” | awk ‘{printf ”%-8s %-8s %6s  %s\n”, $1, $2, $9, $11}’)</pre>

<h2>Top Memory Consuming Processes</h2>

<pre>$(echo “$TOP_MEM” | awk ‘{printf ”%-8s %-8s %6s  %s\n”, $1, $2, $10, $11}’)</pre>

<h2>Network Statistics</h2>

<pre>$NETSTAT</pre>

<h2>Recent System Errors/Warnings</h2>

<pre>$(echo “$RECENT_ERRORS” | head -20)</pre>

</div>

</body>

</html>

EOF

rm -rf “$TEMP_DIR”

echo “Report generated: $REPORT_FILE”

if command -v xdg-open >/dev/null 2>&1; then

xdg-open “$REPORT_FILE”

elif command -v open >/dev/null 2>&1; then

open “$REPORT_FILE”

fi

This creates a professional-looking HTML report with color-coded metrics, responsive design, and comprehensive system analysis.

Q5: Write a script that implements a job queue system with worker processes

This demonstrates advanced process control and inter-process communication:

#!/bin/bash

QUEUE_DIR=“/var/spool/job_queue”

WORKERS=4

PID_FILE=“/var/run/job_queue.pid”

LOG_FILE=“/var/log/job_queue.log”

WORKER_TIMEOUT=300  # 5 minutes max per job

mkdir -p “$QUEUE_DIR”/{pending,processing,completed,failed}

log() {

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] $*” | tee -a “$LOG_FILE”

}

cleanup() {

log “Shutting down job queue system…”

touch “$QUEUE_DIR/.shutdown”

for pid in $(jobs -p); do

log “Waiting for worker $pid to finish…”

wait $pid 2>/dev/null

done

rm -f “$PID_FILE” “$QUEUE_DIR/.shutdown”

log “Shutdown complete”

exit 0

}

trap cleanup SIGINT SIGTERM

add_job() {

local job_type=$1

local job_data=$2

local priority=${3:-5}  # Default priority 5 (1=highest, 9=lowest)

local job_id=$(date +%s%N)_$$

local job_file=“$QUEUE_DIR/pending/${priority}_${job_id}.job”

cat > “$job_file” << EOF

JOB_ID=$job_id

JOB_TYPE=$job_type

JOB_DATA=$job_data

SUBMIT_TIME=$(date +%s)

SUBMIT_USER=$USER

STATUS=pending

EOF

log “Job added: $job_id (type: $job_type, priority: $priority)”

echo “$job_id”

}

worker_process() {

local worker_id=$1

log “Worker $worker_id started (PID: $$)”

while [ ! -f “$QUEUE_DIR/.shutdown” ]; do

Find next job (sorted by priority and age)

Section titled “Find next job (sorted by priority and age)”

local job_file=$(ls -1 “$QUEUE_DIR/pending/“*.job 2>/dev/null | sort | head -1)

if [ -z “$job_file” ]; then

sleep 2

continue

fi

local processing_file=“$QUEUE_DIR/processing/$(basename “$job_file”)”

if ! mv “$job_file” “$processing_file” 2>/dev/null; then

continue

fi

source “$processing_file”

log “Worker $worker_id processing job $JOB_ID (type: $JOB_TYPE)“

echo “WORKER=$worker_id” >> “$processing_file”

echo “START_TIME=$(date +%s)” >> “$processing_file”

echo “STATUS=processing” >> “$processing_file”

local job_output=“$QUEUE_DIR/processing/${JOB_ID}.out”

local job_result=0

case “$JOB_TYPE” in

“compress”)

timeout $WORKER_TIMEOUT tar czf ”${JOB_DATA}.tar.gz” “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“backup”)

timeout $WORKER_TIMEOUT rsync -av “$JOB_DATA” “/backup/$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“report”)

timeout $WORKER_TIMEOUT /usr/local/bin/generate_report.sh “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

“custom”)

Execute custom command (careful with security!)

Section titled “Execute custom command (careful with security!)”

timeout $WORKER_TIMEOUT bash -c “$JOB_DATA” \

“$job_output” 2>&1

job_result=$?

;;

*)

echo “Unknown job type: $JOB_TYPE” > “$job_output”

job_result=1

;;

esac

local end_time=$(date +%s)

local duration=$((end_time - START_TIME))

echo “END_TIME=$end_time” >> “$processing_file”

echo “DURATION=$duration” >> “$processing_file”

echo “EXIT_CODE=$job_result” >> “$processing_file”

if [ $job_result -eq 0 ]; then

echo “STATUS=completed” >> “$processing_file”

mv “$processing_file” “$QUEUE_DIR/completed/”

mv “$job_output” “$QUEUE_DIR/completed/”

log “Worker $worker_id completed job $JOB_ID in ${duration}s”

else

echo “STATUS=failed” >> “$processing_file”

mv “$processing_file” “$QUEUE_DIR/failed/”

mv “$job_output” “$QUEUE_DIR/failed/”

log “Worker $worker_id: job $JOB_ID failed (exit code: $job_result)”

fi

done

log “Worker $worker_id stopped”

}

monitor_status() {

while [ ! -f “$QUEUE_DIR/.shutdown” ]; do

clear

echo “Job Queue Status - $(date)”

echo ”================================“

local pending=$(ls -1 “$QUEUE_DIR/pending/“*.job 2>/dev/null | wc -l)

local processing=$(ls -1 “$QUEUE_DIR/processing/“*.job 2>/dev/null | wc -l)

local completed=$(ls -1 “$QUEUE_DIR/completed/“*.job 2>/dev/null | wc -l)

local failed=$(ls -1 “$QUEUE_DIR/failed/“*.job 2>/dev/null | wc -l)

echo “Pending:    $pending”

echo “Processing: $processing”

echo “Completed:  $completed”

echo “Failed:     $failed”

echo ""

if [ $processing -gt 0 ]; then

echo “Currently Processing:”

echo ”--------------------”

for job in “$QUEUE_DIR/processing/“*.job; do

[ -f “$job” ] || continue

source “$job”

local runtime=$(($(date +%s) - START_TIME))

printf ”  Job %s (Worker %d): %s - %ds\n” \

“$JOB_ID” “$WORKER” “$JOB_TYPE” “$runtime”

done

echo ""

fi

echo “Workers:”

echo ”--------”

for i in $(seq 1 $WORKERS); do

if kill -0 ${WORKER_PIDS[$i]} 2>/dev/null; then

echo ”  Worker $i: Running (PID: ${WORKER_PIDS[$i]})”

else

echo ”  Worker $i: Stopped”

fi

done

sleep 5

done

}

if [ “$1” = “add” ]; then

shift

add_job ”$@”

exit 0

fi

if [ “$1” = “status” ]; then

monitor_status

exit 0

fi

log “Starting job queue system with $WORKERS workers”

echo $$ > “$PID_FILE”

declare -a WORKER_PIDS

for i in $(seq 1 $WORKERS); do

worker_process $i &

WORKER_PIDS[$i]=$!

done

log “Job queue system running. Press Ctrl+C to stop.”

wait

This implements a complete job queue with priority handling, multiple workers, atomic job claiming, timeout protection, and comprehensive logging. It’s the kind of system you’d build to handle background tasks in production.

Shell Scripting Interview Questions and Answers For Experienced (5+ years Exp)

At the senior level, advanced shell scripting interview questions dig deep into architectural decisions, performance at scale, and the wisdom that comes from maintaining production systems through their entire lifecycle. These questions explore how you handle the messy realities of enterprise environments - legacy system integration, cross-platform compatibility nightmares, and scripts that need to run reliably for years. The focus shifts from “can you write it” to “have you lived through the consequences of different approaches” and whether you can design solutions that other engineers can maintain long after you’ve moved on.

Q1: How do you design shell scripts for high-availability environments where failure isn’t an option?

After years of 3 AM calls, I’ve learned that HA scripts need multiple layers of protection. First, I implement circuit breakers - if a script fails repeatedly, it stops trying and alerts instead of hammering a broken system. I use distributed locking across nodes (often with Redis or etcd) to prevent split-brain scenarios. Here’s my approach:

Health checks before actions: Never assume a service is up. Always verify endpoints are responding correctly before sending traffic.

Idempotency everywhere: Every operation should be safe to run multiple times. I use checksums, version checks, and state files to ensure this.

Graceful degradation: If a non-critical component fails, the script continues with reduced functionality rather than failing completely.

Audit trails: Every action gets logged with who/what/when/why, often shipped to a central logging system for correlation.

The key insight I’ve gained is that HA isn’t about preventing failures - it’s about failing gracefully and recovering automatically. I’ve seen too many “clever” scripts that made things worse during outages.

Q2: Explain your approach to handling sensitive data in shell scripts

This is where experience really shows. Never, ever put secrets in scripts. I’ve cleaned up too many messes where passwords were in version control. My approach:

Environment variables for local development only, never in production

Secrets management tools like HashiCorp Vault, AWS Secrets Manager, or Kubernetes secrets

Temporary credentials with short TTLs whenever possible

Audit logging for every secret access

I also use set +x before any line that might expose secrets in debug output, and I’m paranoid about temporary files - always created with mktemp and proper permissions, cleaned up in trap handlers. I’ve seen scripts that wrote database passwords to /tmp with world-readable permissions. That’s a career-limiting move.

Q3: How do you optimize shell scripts that process millions of records?

The biggest lesson I’ve learned: the shell isn’t always the answer. But when it is, here’s what works:

Streaming over loading: Never load large datasets into memory. Use pipes and process line by line.

GNU Parallel for CPU-bound tasks. It’s incredible how much faster things run with proper parallelization.

Sort/join over nested loops: Unix tools are optimized for this. A sort | join pipeline beats nested loops every time.

Minimize process spawning: That for loop calling sed 1000 times? Rewrite it as a single awk script.

Real example: I once replaced a 6-hour customer data processing script with a sort | join | awk pipeline that ran in 12 minutes. The original developer was reading a 2GB file into an associative array. Sometimes the old Unix philosophy of small, focused tools really shines.

Q4: Describe your most complex debugging experience with shell scripts

The worst one that comes to mind involved a script that worked perfectly for 3 years, then started randomly failing on Tuesdays. Turned out a log rotation job was creating a race condition, but only when the log file exceeded 2GB, which only happened on our busiest day.

My debugging toolkit has evolved from painful experiences:

strace/dtrace to see actual system calls

bash -x is just the starting point; I often add custom debug functions that can be toggled

Reproducing environments exactly - same shell version, same locale settings, same everything

Binary search debugging - comment out half the script, see if it still fails

The real skill is knowing when to stop debugging and rewrite. I’ve learned that if I’m spending more than a day debugging a complex script, it’s probably too complex and needs to be simplified or rewritten in a proper programming language.

Q5: How do you handle shell script deployment and versioning across hundreds of servers?

Configuration management is crucial here. I’ve used Puppet, Ansible, and Chef, but the principles remain the same:

Immutable deployments: Scripts are versioned packages, not edited in place

Canary deployments: Roll out to 1%, then 10%, then 50%, monitoring metrics at each stage

Feature flags: Even in shell scripts, I implement toggles for risky changes

Rollback procedures: Always have a quick way back. I version with symlinks for instant rollback

One hard lesson: never trust system package managers alone. I’ve been burned by different versions of bash, different coreutils implementations, even different versions of basic commands like ‘date’. Now I always include compatibility checks and sometimes ship specific binary versions with my scripts.

Q6: What’s your philosophy on when to use shell scripts versus a “real” programming language?

This is the question that separates experienced engineers from script kiddies. My rule of thumb:

Shell scripts are perfect for:

Gluing systems together

System administration tasks

Quick prototypes

Build and deployment automation

But I switch to Python/Go/Ruby when:

The script exceeds 200-300 lines

I need complex data structures

Error handling becomes more complex than the actual logic

I’m parsing anything more complex than simple delimited data

Performance is critical

I’ve maintained 2000-line bash scripts. It’s not fun. The maintenance cost grows exponentially with complexity. Now I’m quick to recognize when bash has served its purpose and it’s time to rewrite.

Q7: How do you ensure shell script security in a zero-trust environment?

Security has evolved way beyond just checking inputs. In zero-trust environments:

No permanent credentials: Everything uses temporary tokens with the minimum required scope

Mutual TLS for script-to-service communication

Signed scripts: We GPG sign critical scripts and verify signatures before execution

Minimal attack surface: Scripts run in containers or VMs with only required tools installed

Security scanning: All scripts go through static analysis tools looking for common vulnerabilities

I’ve also learned to be paranoid about seemingly innocent things. That curl command downloading a script from GitHub? What if someone compromises the repo? Now I verify checksums and use pinned versions for everything.

Q8: Describe your approach to making shell scripts cloud-agnostic

After migrating between AWS, GCP, and Azure multiple times, I’ve learned:

Abstract provider-specific calls: Create wrapper functions for cloud operations

Use cloud SDK tools carefully: They change. Always pin versions and have fallbacks

Metadata services are different: Each cloud has its own way. Abstract this early

Storage is never just storage: S3, GCS, and Azure Blob have subtle differences

Example approach:

cloud_provider_detect() {

if curl -s -f -m 1 http://169.254.169.254/latest/meta-data/instance-id >/dev/null 2>&1; then

echo “aws”

elif curl -s -f -m 1 -H “Metadata-Flavor: Google” http://metadata.google.internal/computeMetadata/v1/instance/id >/dev/null 2>&1; then

echo “gcp”

else

echo “azure”  # Or onprem, needs more logic

fi

}

The key is planning for portability from day one. Retrofitting cloud-agnostic behavior is painful.

Q9: How do you handle backwards compatibility when system tools get updated?

This is a constant battle. GNU vs BSD tools, different bash versions, changing command options. My strategies:

Feature detection over version detection: Don’t check if it’s bash 4.3, check if it supports associative arrays

Wrapper functions: Abstract system commands behind functions that handle differences

Comprehensive testing matrix: Test on minimum supported versions of everything

Document requirements explicitly: Every script starts with a requirements block

I maintain compatibility libraries for common issues:

portable_readlink() {

if readlink -f “$1” 2>/dev/null; then

return

elif command -v greadlink >/dev/null; then

greadlink -f “$1”

else

python -c “import os; print(os.path.realpath(‘$1’))”

fi

}

Q10: What are the most important lessons you’ve learned about shell scripting in production?

The scars teach the best lessons:

Observability beats cleverness: A simple script with great logging beats a clever script you can’t debug

Plan for failure from line 1: Not just error handling, but operational failure - what happens when your script runs during a datacenter failover?

Other people will maintain your code: Write for them, not to show off. Comment the “why”, not the “what”

Test the unhappy paths: Everyone tests success. Test what happens when that API returns 500, when the disk is full, when DNS is flaking

Know when to stop: Some problems shouldn’t be solved in bash. Recognizing this saves weeks of pain

The meta-lesson: shell scripting in production is 20% writing code and 80% thinking about what could go wrong. Every senior engineer has war stories about simple scripts causing major outages. The difference is we’ve learned to be paranoid in productive ways.

Shell Scripting Coding Interview Questions and Answers For Experienced (5+ years Exp)

Senior-level advanced shell scripting interview questions go beyond algorithms to test battle-hardened production experience - can you write code that survives server crashes, handles race conditions, and scales across distributed systems? These coding challenges reflect real scenarios you’ve probably debugged at 3 AM: service orchestration failures, data corruption recovery, and the kind of edge cases that only show up after months in production. The solutions here demonstrate not just working code, but the defensive programming and architectural thinking that comes from years of learning things the hard way.

Q1: Write a distributed lock manager for coordinating jobs across multiple servers

This solves the classic problem of preventing duplicate cron jobs in clustered environments:

#!/bin/bash

Distributed lock implementation using shared filesystem or Redis

Section titled “Distributed lock implementation using shared filesystem or Redis”

Handles stale locks, network partitions, and crash recovery

Section titled “Handles stale locks, network partitions, and crash recovery”

LOCK_DIR=“/shared/locks”

REDIS_HOST=”${REDIS_HOST:-localhost}”

REDIS_PORT=”${REDIS_PORT:-6379}”

BACKEND=”${LOCK_BACKEND:-filesystem}”  # or “redis”

readonly SCRIPT_NAME=$(basename “$0”)

readonly HOSTNAME=$(hostname -f)

readonly PID=$$

readonly LOCK_TIMEOUT=${LOCK_TIMEOUT:-300}  # 5 minutes default

readonly LOCK_RETRY_INTERVAL=2

readonly STALE_LOCK_THRESHOLD=600  # 10 minutes

generate_lock_id() {

echo ”${HOSTNAME}-${PID}-$(date +%s%N)”

}

fs_acquire_lock() {

local lock_name=$1

local lock_file=“$LOCK_DIR/$lock_name.lock”

local lock_id=$(generate_lock_id)

local temp_file=$(mktemp)

cat > “$temp_file” << EOF

{

“holder”: “$lock_id”,

“hostname”: “$HOSTNAME”,

“pid”: $PID,

“acquired”: $(date +%s),

“timeout”: $LOCK_TIMEOUT,

“command”: “$0 $*”

}

EOF

if ln “$temp_file” “$lock_file” 2>/dev/null; then

rm -f “$temp_file”

echo “$lock_id”

return 0

fi

if [ -f “$lock_file” ]; then

local lock_age=$(($(date +%s) - $(stat -c %Y “$lock_file” 2>/dev/null || stat -f %m “$lock_file”)))

local lock_data=$(cat “$lock_file” 2>/dev/null)

local lock_pid=$(echo “$lock_data” | grep -o ‘“pid”: [0-9]’ | grep -o ‘[0-9]’)

local lock_host=$(echo “$lock_data” | grep -o ‘“hostname”: ”[^”]*”’ | cut -d’”’ -f4)

if [ $lock_age -gt $STALE_LOCK_THRESHOLD ]; then

echo “Removing stale lock (age: ${lock_age}s)” >&2

rm -f “$lock_file”

fs_acquire_lock “$lock_name”

return $?

fi

Check if process still exists (same host only)

Section titled “Check if process still exists (same host only)”

if [ “$lock_host” = “$HOSTNAME” ] && [ -n “$lock_pid” ]; then

if ! kill -0 “$lock_pid” 2>/dev/null; then

echo “Removing lock from dead process $lock_pid” >&2

rm -f “$lock_file”

fs_acquire_lock “$lock_name”

return $?

fi

fi

fi

rm -f “$temp_file”

return 1

}

fs_release_lock() {

local lock_name=$1

local lock_id=$2

local lock_file=“$LOCK_DIR/$lock_name.lock”

if [ -f “$lock_file” ]; then

local current_holder=$(grep -o ‘“holder”: ”[^”]*”’ “$lock_file” 2>/dev/null | cut -d’”’ -f4)

if [ “$current_holder” = “$lock_id” ]; then

rm -f “$lock_file”

return 0

else

echo “Cannot release lock owned by $current_holder” >&2

return 1

fi

fi

return 0

}

redis_acquire_lock() {

local lock_name=$1

local lock_id=$(generate_lock_id)

local lock_key=“dlock:$lock_name”

Try to acquire lock with NX (only if not exists) and EX (expiry)

Section titled “Try to acquire lock with NX (only if not exists) and EX (expiry)”

local result=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

SET “$lock_key” “$lock_id” NX EX “$LOCK_TIMEOUT” 2>/dev/null)

if [ “$result” = “OK” ]; then

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” >/dev/null 2>&1 <<EOF

HSET ”${lock_key}:meta” holder “$lock_id”

HSET ”${lock_key}:meta” hostname “$HOSTNAME”

HSET ”${lock_key}:meta” pid “$PID”

HSET ”${lock_key}:meta” acquired ”$(date +%s)”

EXPIRE ”${lock_key}:meta” “$LOCK_TIMEOUT”

EOF

echo “$lock_id”

return 0

fi

Check if lock is held by dead process on same host

Section titled “Check if lock is held by dead process on same host”

local lock_host=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

HGET ”${lock_key}:meta” hostname 2>/dev/null)

local lock_pid=$(redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” \

HGET ”${lock_key}:meta” pid 2>/dev/null)

if [ “$lock_host” = “$HOSTNAME” ] && [ -n “$lock_pid” ]; then

if ! kill -0 “$lock_pid” 2>/dev/null; then

echo “Removing lock from dead process $lock_pid” >&2

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” DEL “$lock_key” ”${lock_key}:meta” >/dev/null

redis_acquire_lock “$lock_name”

return $?

fi

fi

return 1

}

redis_release_lock() {

local lock_name=$1

local lock_id=$2

local lock_key=“dlock:$lock_name”

Use Lua script for atomic check-and-delete

Section titled “Use Lua script for atomic check-and-delete”

redis-cli -h “$REDIS_HOST” -p “$REDIS_PORT” —eval - “$lock_key” “$lock_id” <<‘EOF’ >/dev/null

if redis.call(“GET”, KEYS[1]) == ARGV[1] then

redis.call(“DEL”, KEYS[1], KEYS[1] .. ":meta")

return 1

else

return 0

end

EOF

}

acquire_lock() {

local lock_name=$1

local max_wait=${2:-0}

local waited=0

while true; do

local lock_id

case “$BACKEND” in

filesystem)

lock_id=$(fs_acquire_lock “$lock_name”)

;;

redis)

lock_id=$(redis_acquire_lock “$lock_name”)

;;

*)

echo “Unknown backend: $BACKEND” >&2

return 1

;;

esac

if [ -n “$lock_id” ]; then

echo “$lock_id”

return 0

fi

if [ $max_wait -gt 0 ] && [ $waited -ge $max_wait ]; then

return 1

fi

sleep $LOCK_RETRY_INTERVAL

waited=$((waited + LOCK_RETRY_INTERVAL))

done

}

release_lock() {

local lock_name=$1

local lock_id=$2

case “$BACKEND” in

filesystem)

fs_release_lock “$lock_name” “$lock_id”

;;

redis)

redis_release_lock “$lock_name” “$lock_id”

;;

esac

}

with_lock() {

local lock_name=$1

shift

local lock_id

lock_id=$(acquire_lock “$lock_name” 30)  # Wait up to 30 seconds

if [ -z “$lock_id” ]; then

echo “Failed to acquire lock: $lock_name” >&2

return 1

fi

trap “release_lock ‘$lock_name’ ‘$lock_id’” EXIT INT TERM

echo “Lock acquired: $lock_name (id: $lock_id)” >&2

”$@”

local exit_code=$?

release_lock “$lock_name” “$lock_id”

trap - EXIT INT TERM

return $exit_code

}

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

critical_task() {

echo “Starting critical task on $HOSTNAME…”

sleep 10

echo “Critical task completed”

}

with_lock “my-critical-task” critical_task

fi

This implementation handles real-world edge cases: stale locks from crashed processes, network partitions, and provides both filesystem and Redis backends for different infrastructure setups.

Q2: Create a self-healing service monitor that automatically restarts failed services with exponential backoff

This handles the complexity of service management in production:

#!/bin/bash

Self-healing service monitor with intelligent restart logic

Section titled “Self-healing service monitor with intelligent restart logic”

Prevents restart loops and implements circuit breaker pattern

Section titled “Prevents restart loops and implements circuit breaker pattern”

readonly CONFIG_DIR=“/etc/service-monitor”

readonly STATE_DIR=“/var/lib/service-monitor”

readonly LOG_FILE=“/var/log/service-monitor.log”

mkdir -p “$CONFIG_DIR” “$STATE_DIR”

{

  “start_cmd”: “systemctl start web-app”,

Section titled “  “start_cmd”: “systemctl start web-app”,”

  “stop_cmd”: “systemctl stop web-app”,

Section titled “  “stop_cmd”: “systemctl stop web-app”,”

  “restart_cmd”: “systemctl restart web-app”,

Section titled “  “restart_cmd”: “systemctl restart web-app”,”

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” | tee -a “$LOG_FILE”

if command -v logger >/dev/null 2>&1; then

logger -t “service-monitor” -p “daemon.$level” ”$*”

fi

}

load_service_config() {

local service_name=$1

local config_file=“$CONFIG_DIR/${service_name}.json”

if [ ! -f “$config_file” ]; then

log “error” “Configuration not found for service: $service_name”

return 1

fi

Parse JSON (using jq if available, fallback to python)

Section titled “Parse JSON (using jq if available, fallback to python)”

if command -v jq >/dev/null 2>&1; then

jq -r ‘to_entries | .[] | “(.key)=(.value)”’ “$config_file”

else

python -c ”

import json, sys

with open(‘$config_file’) as f:

config = json.load(f)

for k, v in config.items():

print(f’{k}={v}’)

fi

}

get_service_state() {

local service_name=$1

local state_file=“$STATE_DIR/${service_name}.state”

if [ ! -f “$state_file” ]; then

cat > “$state_file” << EOF

restart_count=0

last_restart=0

consecutive_failures=0

backoff_seconds=1

status=unknown

last_check=0

total_restarts=0

circuit_breaker=closed

EOF

fi

source “$state_file”

}

update_service_state() {

local service_name=$1

local state_file=“$STATE_DIR/${service_name}.state”

shift

local temp_file=$(mktemp)

cp “$state_file” “$temp_file” 2>/dev/null || touch “$temp_file”

while [ $# -gt 0 ]; do

local key=”${1%%=*}”

local value=”${1#*=}“

if grep -q ”^${key}=” “$temp_file”; then

sed -i “s/^${key}=.*/${key}=${value}/” “$temp_file”

else

echo ”${key}=${value}” >> “$temp_file”

fi

shift

done

mv “$temp_file” “$state_file”

}

calculate_backoff() {

local base=$1

local attempt=$2

local max_backoff=$3

local backoff=$((base ** attempt))

if [ $backoff -gt $max_backoff ]; then

backoff=$max_backoff

fi

Add jitter (±25%) to prevent thundering herd

Section titled “Add jitter (±25%) to prevent thundering herd”

local jitter=$((backoff / 4))

local random_jitter=$((RANDOM % (jitter * 2) - jitter))

backoff=$((backoff + random_jitter))

echo $backoff

}

check_service_health() {

local check_cmd=$1

timeout 30 bash -c “$check_cmd” >/dev/null 2>&1

}

restart_service() {

local service_name=$1

local restart_cmd=$2

local backoff_base=$3

local max_backoff=$4

get_service_state “$service_name”

local backoff=$(calculate_backoff “$backoff_base” “$consecutive_failures” “$max_backoff”)

log “warning” “Restarting $service_name (attempt $((consecutive_failures + 1)), waiting ${backoff}s)“

sleep “$backoff”

if $restart_cmd; then

log “info” “Successfully restarted $service_name”

update_service_state “$service_name” \

“last_restart=$(date +%s)” \

“backoff_seconds=1” \

“total_restarts=$((total_restarts + 1))”

return 0

else

log “error” “Failed to restart $service_name”

return 1

fi

}

monitor_service() {

local service_name=$1

eval ”$(load_service_config “$service_name”)“

for required in check_cmd restart_cmd max_restarts restart_window; do

if [ -z ”${!required}” ]; then

log “error” “Missing required config: $required for $service_name”

return 1

fi

done

backoff_base=${backoff_base:-2}

max_backoff=${max_backoff:-300}

log “info” “Starting monitor for $service_name”

while true; do

get_service_state “$service_name”

if [ “$circuit_breaker” = “open” ]; then

local circuit_break_duration=$(($(date +%s) - last_restart))

if [ $circuit_break_duration -lt 3600 ]; then

sleep 60

continue

else

log “info” “Attempting to close circuit breaker for $service_name”

update_service_state “$service_name” “circuit_breaker=half-open”

fi

fi

if check_service_health “$check_cmd”; then

if [ “$status” != “healthy” ] || [ “$consecutive_failures” -gt 0 ]; then

log “info” “$service_name is healthy”

update_service_state “$service_name” \

“status=healthy” \

“consecutive_failures=0” \

“circuit_breaker=closed”

fi

else

log “error” “$service_name health check failed”

consecutive_failures=$((consecutive_failures + 1))

update_service_state “$service_name” \

“status=unhealthy” \

“consecutive_failures=$consecutive_failures” \

“last_check=$(date +%s)“

current_time=$(date +%s)

window_start=$((current_time - restart_window))

recent_restarts=0

if [ -f “$STATE_DIR/${service_name}.history” ]; then

recent_restarts=$(awk -v start=“$window_start” ‘$1 > start’ \

“$STATE_DIR/${service_name}.history” | wc -l)

fi

if [ $recent_restarts -ge $max_restarts ]; then

if [ “$circuit_breaker” != “open” ]; then

log “error” “Circuit breaker OPEN for $service_name (too many restarts)”

update_service_state “$service_name” “circuit_breaker=open”

send_alert “$service_name” “Circuit breaker opened - manual intervention required”

fi

else

if restart_service “$service_name” “$restart_cmd” “$backoff_base” “$max_backoff”; then

echo ”$(date +%s) restart” >> “$STATE_DIR/${service_name}.history”

Wait before next check to allow service to stabilize

Section titled “Wait before next check to allow service to stabilize”

sleep 30

else

update_service_state “$service_name” \

“consecutive_failures=$consecutive_failures”

fi

fi

fi

sleep ”${check_interval:-60}”

done

}

Send alerts (integrate with your alerting system)

Section titled “Send alerts (integrate with your alerting system)”

send_alert() {

local service=$1

local message=$2

if [ -n “$SLACK_WEBHOOK” ]; then

curl -X POST “$SLACK_WEBHOOK” \

-H “Content-Type: application/json” \

-d ”{“text”: “Service Monitor Alert: $service - $message”}” \

2>/dev/null || true

fi

if command -v mail >/dev/null 2>&1 && [ -n “$ALERT_EMAIL” ]; then

echo “$message” | mail -s “Service Monitor: $service” “$ALERT_EMAIL”

fi

log “alert” “$service: $message”

}

main() {

for config_file in “$CONFIG_DIR”/*.json; do

[ -f “$config_file” ] || continue

service_name=$(basename “$config_file” .json)

monitor_service “$service_name” &

echo $! > “$STATE_DIR/${service_name}.pid”

done

wait

}

trap ‘log “info” “Shutting down service monitor”; pkill -P $$; exit 0’ TERM INT

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

fi

Q3: Implement a log aggregation and analysis system that detects anomalies across distributed services

This demonstrates handling big data streams and pattern recognition:

#!/bin/bash

Distributed log analysis system with anomaly detection

Section titled “Distributed log analysis system with anomaly detection”

Handles multiple log formats, performs statistical analysis, and alerts on anomalies

Section titled “Handles multiple log formats, performs statistical analysis, and alerts on anomalies”

readonly AGGREGATOR_PORT=9514

readonly ANALYSIS_INTERVAL=60

readonly BASELINE_WINDOW=3600  # 1 hour

readonly ANOMALY_THRESHOLD=3   # Standard deviations

readonly STATE_DIR=“/var/lib/log-analyzer”

readonly PATTERNS_FILE=“/etc/log-analyzer/patterns.conf”

mkdir -p “$STATE_DIR”

Pattern definitions for different log types

Section titled “Pattern definitions for different log types”

declare -A LOG_PATTERNS=(

[“nginx”]=’(?P<ip>\S+) \S+ \S+ [(?P<time>[^]]+)] ”(?P<method>\S+) (?P<path>\S+) \S+” (?P<status>\d+) (?P<size>\d+)’

[“apache”]=’(?P<ip>\S+) \S+ \S+ [(?P<time>[^]]+)] ”(?P<method>\S+) (?P<path>\S+) \S+” (?P<status>\d+) (?P<size>\d+)’

[“syslog”]=’(?P<time>\S+ \S+ \S+) (?P<host>\S+) (?P<process>[^[]+)[(?P<pid>\d+)]: (?P<message>.*)’

[“json”]=’^\{.*\}$’

)

calculate_stats() {

awk ’

{

data[NR] = $1

sum += $1

if (NR == 1 || $1 < min) min = $1

if (NR == 1 || $1 > max) max = $1

}

END {

if (NR == 0) {

print “0 0 0 0 0 0”

exit

}

mean = sum / NR

for (i = 1; i <= NR; i++) {

variance += (data[i] - mean) ^ 2

}

variance = variance / NR

stddev = sqrt(variance)

n = int(NR * 0.95)

if (n < 1) n = 1

for (i = 1; i <= NR; i++) {

for (j = i + 1; j <= NR; j++) {

if (data[i] > data[j]) {

temp = data[i]

data[i] = data[j]

data[j] = temp

}

}

}

print NR, mean, stddev, min, max, data[n]

}

}

is_anomaly() {

local value=$1

local mean=$2

local stddev=$3

local threshold=${4:-$ANOMALY_THRESHOLD}

if (( $(echo “$stddev == 0” | bc -l) )); then

return 1

fi

local zscore=$(echo “scale=2; ($value - $mean) / $stddev” | bc -l)

local abs_zscore=$(echo “scale=2; if ($zscore < 0) -$zscore else $zscore” | bc -l)

if (( $(echo “$abs_zscore > $threshold” | bc -l) )); then

return 0  # Is anomaly

else

return 1  # Not anomaly

fi

}

parse_log_line() {

local line=$1

local log_type=$2

case “$log_type” in

json)

if command -v jq >/dev/null 2>&1; then

echo “$line” | jq -r ’

@tsv “(.timestamp // now | todate) (.level // “INFO”) (.service // “unknown”) (.message // "")”

’ 2>/dev/null

else

python3 -c ”

import json, sys, datetime

try:

data = json.loads(‘$line’)

timestamp = data.get(‘timestamp’, datetime.datetime.now().isoformat())

level = data.get(‘level’, ‘INFO’)

service = data.get(‘service’, ‘unknown’)

message = data.get(‘message’, ”)

print(f’{timestamp}\t{level}\t{service}\t{message}’)

except:

pass

fi

;;

nginx|apache)

python3 -c ”

import re

pattern = r’${LOG_PATTERNS[$log_type]}’

match = re.match(pattern, '''$line''')

if match:

data = match.groupdict()

print(f”{data.get(‘time’, ”)}\t{data.get(‘status’, ”)}\t{data.get(‘method’, ”)}\t{data.get(‘path’, ”)}”)

;;

*)

echo “$line” | awk ‘{print $1, $2, $3, $0}’

;;

esac

}

aggregate_logs() {

local output_file=$1

local duration=$2

nc -l -k -p “$AGGREGATOR_PORT” > “$output_file” 2>/dev/null &

local nc_pid=$!

if command -v journalctl >/dev/null 2>&1; then

journalctl -f —since=“$duration seconds ago” >> “$output_file” 2>/dev/null &

local journal_pid=$!

fi

sleep “$duration”

kill $nc_pid 2>/dev/null

[ -n “$journal_pid” ] && kill $journal_pid 2>/dev/null

wc -l < “$output_file”

}

analyze_logs() {

local log_file=$1

local analysis_output=“$STATE_DIR/analysis_$(date +%s).json”

declare -A error_counts

declare -A response_times

declare -A status_codes

declare -A service_errors

while IFS= read -r line; do

local log_type=“generic”

if [[ “$line” =~ ^\{ ]]; then

log_type=“json”

elif [[ “$line” =~ “[A-Z]+\ .*HTTP ]]; then

log_type=“nginx”

fi

local parsed=$(parse_log_line “$line” “$log_type”)

[ -z “$parsed” ] && continue

case “$log_type” in

json)

IFS=$‘\t’ read -r timestamp level service message <<< “$parsed”

if [[ “$level” =~ ERROR|CRITICAL|FATAL ]]; then

((service_errors[“$service”]++))

fi

;;

nginx|apache)

IFS=$‘\t’ read -r timestamp status method path <<< “$parsed”

((status_codes[“$status”]++))

if [[ “$status” =~ ^[45] ]]; then

((error_counts[“http_errors”]++))

fi

;;

esac

done < “$log_file”

local baseline_file=“$STATE_DIR/baseline.json”

if [ -f “$baseline_file” ]; then

for service in ”${!service_errors[@]}”; do

local current_errors=${service_errors[$service]}

local baseline_mean=$(jq -r “.services.$service.error_rate.mean // 0” “$baseline_file”)

local baseline_stddev=$(jq -r “.services.$service.error_rate.stddev // 1” “$baseline_file”)

if is_anomaly “$current_errors” “$baseline_mean” “$baseline_stddev”; then

log “alert” “Anomaly detected: $service error rate is $current_errors (baseline: $baseline_mean ± $baseline_stddev)”

send_anomaly_alert “$service” “error_rate” “$current_errors” “$baseline_mean” “$baseline_stddev”

fi

done

fi

cat > “$analysis_output” << EOF

{

“timestamp”: ”$(date -u +%Y-%m-%dT%H:%M:%SZ)”,

“total_lines”: $(wc -l < “$log_file”),

“services”: {

EOF

local first=true

for service in ”${!service_errors[@]}”; do

[ “$first” = true ] && first=false || echo ”,” >> “$analysis_output”

cat >> “$analysis_output” << EOF

“$service”: {

“error_count”: ${service_errors[$service]},

“error_rate”: $(echo “scale=2; ${service_errors[$service]} * 100 / $(wc -l < “$log_file”)” | bc)

}

EOF

done

echo -e “\n    },\n    “status_codes”: {” >> “$analysis_output”

first=true

for status in ”${!status_codes[@]}”; do

[ “$first” = true ] && first=false || echo ”,” >> “$analysis_output”

echo -n ”        “$status”: ${status_codes[$status]}” >> “$analysis_output”

done

echo -e “\n    }\n}” >> “$analysis_output”

Update baseline with exponential moving average

Section titled “Update baseline with exponential moving average”

update_baseline “$analysis_output”

echo “$analysis_output”

}

update_baseline() {

local current_analysis=$1

local baseline_file=“$STATE_DIR/baseline.json”

local alpha=0.3  # EMA weight for new data

if [ ! -f “$baseline_file” ]; then

cp “$current_analysis” “$baseline_file”

return

fi

Update baseline using exponential moving average

Section titled “Update baseline using exponential moving average”

python3 - “$baseline_file” “$current_analysis” “$alpha” << ‘EOF’

import json

import sys

baseline_file, current_file, alpha = sys.argv[1:4]

alpha = float(alpha)

with open(baseline_file) as f:

baseline = json.load(f)

with open(current_file) as f:

current = json.load(f)

for service, metrics in current.get(‘services’, {}).items():

if service not in baseline.get(‘services’, {}):

baseline.setdefault(‘services’, {})[service] = {

‘error_rate’: {‘mean’: 0, ‘stddev’: 1}

}

old_mean = baseline[‘services’][service][‘error_rate’][‘mean’]

new_value = metrics[‘error_rate’]

new_mean = alpha * new_value + (1 - alpha) * old_mean

old_var = baseline[‘services’][service][‘error_rate’].get(‘variance’, 1)

new_var = alpha * ((new_value - new_mean) ** 2) + (1 - alpha) * old_var

baseline[‘services’][service][‘error_rate’] = {

‘mean’: new_mean,

‘stddev’: new_var ** 0.5,

‘variance’: new_var

}

with open(baseline_file, ‘w’) as f:

json.dump(baseline, f, indent=2)

EOF

}

send_anomaly_alert() {

local service=$1

local metric=$2

local current_value=$3

local baseline_mean=$4

local baseline_stddev=$5

local zscore=$(echo “scale=2; ($current_value - $baseline_mean) / $baseline_stddev” | bc -l)

local alert_data=$(cat << EOF

{

“service”: “$service”,

“metric”: “$metric”,

“current_value”: $current_value,

“baseline_mean”: $baseline_mean,

“baseline_stddev”: $baseline_stddev,

“z_score”: $zscore,

“timestamp”: ”$(date -u +%Y-%m-%dT%H:%M:%SZ)”,

“severity”: ”$([ $(echo “$zscore > 5” | bc) -eq 1 ] && echo “critical” || echo “warning”)”

}

EOF

)

if [ -n “$WEBHOOK_URL” ]; then

curl -X POST “$WEBHOOK_URL” \

-H “Content-Type: application/json” \

-d “$alert_data” \

2>/dev/null || true

fi

echo “$alert_data” >> “$STATE_DIR/alerts.jsonl”

if [ -x “/etc/log-analyzer/alert-handler.sh” ]; then

echo “$alert_data” | /etc/log-analyzer/alert-handler.sh

fi

}

main() {

log “info” “Starting distributed log analyzer”

while true; do

local temp_log=$(mktemp)

log “info” “Collecting logs for $ANALYSIS_INTERVAL seconds”

local line_count=$(aggregate_logs “$temp_log” “$ANALYSIS_INTERVAL”)

log “info” “Collected $line_count log lines”

if [ “$line_count” -gt 0 ]; then

local analysis_file=$(analyze_logs “$temp_log”)

log “info” “Analysis complete: $analysis_file”

Clean up old analysis files (keep last 24 hours)

Section titled “Clean up old analysis files (keep last 24 hours)”

find “$STATE_DIR” -name “analysis_*.json” -mtime +1 -delete

fi

rm -f “$temp_log”

sleep 10

done

}

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” >&2

}

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

fi

Q4: Build a zero-downtime deployment system with automatic rollback on failure

This handles complex deployment orchestration:

#!/bin/bash

Zero-downtime deployment system with health checks and automatic rollback

Section titled “Zero-downtime deployment system with health checks and automatic rollback”

Supports blue-green and rolling deployments across multiple servers

Section titled “Supports blue-green and rolling deployments across multiple servers”

readonly DEPLOY_DIR=“/opt/deployments”

readonly CONFIG_FILE=“/etc/deploy/config.json”

readonly STATE_FILE=“/var/lib/deploy/state.json”

readonly HEALTH_CHECK_RETRIES=10

readonly HEALTH_CHECK_INTERVAL=5

readonly CANARY_PERCENTAGE=10

declare -A STRATEGIES=(

[“blue-green”]=“deploy_blue_green”

[“rolling”]=“deploy_rolling”

[“canary”]=“deploy_canary”

)

mkdir -p ”$(dirname “$STATE_FILE”)” “$DEPLOY_DIR”

update_state() {

local temp_file=$(mktemp)

if [ -f “$STATE_FILE” ]; then

cp “$STATE_FILE” “$temp_file”

else

echo ”{}” > “$temp_file”

fi

local key=$1

local value=$2

jq —arg k “$key” —arg v “$value” ’.[$k] = $v’ “$temp_file” > ”${temp_file}.new”

mv ”${temp_file}.new” “$STATE_FILE”

rm -f “$temp_file”

}

get_state() {

local key=$1

if [ -f “$STATE_FILE” ]; then

jq -r —arg k “$key” ’.[$k] // empty’ “$STATE_FILE”

fi

}

Load balancer management (HAProxy example)

Section titled “Load balancer management (HAProxy example)”

update_load_balancer() {

local action=$1

local server=$2

local backend=${3:-“webservers”}

case “$action” in

drain)

echo “set server $backend/$server state drain” | \

socat stdio /var/run/haproxy.sock

;;

ready)

echo “set server $backend/$server state ready” | \

socat stdio /var/run/haproxy.sock

;;

maint)

echo “disable server $backend/$server” | \

socat stdio /var/run/haproxy.sock

;;

enable)

echo “enable server $backend/$server” | \

socat stdio /var/run/haproxy.sock

;;

esac

}

wait_for_drain() {

local server=$1

local backend=${2:-“webservers”}

local max_wait=60

local waited=0

log “info” “Waiting for connections to drain from $server”

while [ $waited -lt $max_wait ]; do

local conn_count=$(echo “show servers state $backend” | \

socat stdio /var/run/haproxy.sock | \

grep “$server” | awk ‘{print $7}’)

if [ “$conn_count” -eq 0 ]; then

log “info” “Server $server drained successfully”

return 0

fi

sleep 2

waited=$((waited + 2))

done

log “warning” “Timeout waiting for $server to drain (still $conn_count connections)”

return 1

}

perform_health_check() {

local server=$1

local health_endpoint=$2

local expected_response=${3:-“200”}

local attempt=1

while [ $attempt -le $HEALTH_CHECK_RETRIES ]; do

log “debug” “Health check attempt $attempt for $server”

local response=$(curl -s -o /dev/null -w ”%{http_code}” \

—connect-timeout 5 \

—max-time 10 \

“http://$server$health_endpoint”)

if [[ “$response” =~ $expected_response ]]; then

log “info” “Health check passed for $server”

return 0

fi

log “warning” “Health check failed for $server (got $response, expected $expected_response)”

sleep $HEALTH_CHECK_INTERVAL

attempt=$((attempt + 1))

done

log “error” “Health check failed for $server after $HEALTH_CHECK_RETRIES attempts”

return 1

}

deploy_to_server() {

local server=$1

local artifact=$2

local deploy_user=${3:-“deploy”}

local deploy_path=${4:-“/var/www/app”}

log “info” “Deploying $artifact to $server”

Create deployment directory with timestamp

Section titled “Create deployment directory with timestamp”

local timestamp=$(date +%Y%m%d_%H%M%S)

local new_release_path=”${deploy_path}/releases/${timestamp}“

ssh “$deploy_user@$server” “mkdir -p ${deploy_path}/releases”

if ! scp “$artifact” “$deploy_user@$server:/tmp/deploy_${timestamp}.tar.gz”; then

log “error” “Failed to transfer artifact to $server”

return 1

fi

if ! ssh “$deploy_user@$server” ”

set -e

cd ${deploy_path}/releases

mkdir -p ${timestamp}

tar -xzf /tmp/deploy_${timestamp}.tar.gz -C ${timestamp}

rm -f /tmp/deploy_${timestamp}.tar.gz

if [ -x ${timestamp}/deploy/pre-deploy.sh ]; then

cd ${timestamp}

./deploy/pre-deploy.sh

fi

”; then

log “error” “Failed to prepare deployment on $server”

return 1

fi

local current_link=$(ssh “$deploy_user@$server” “readlink ${deploy_path}/current” 2>/dev/null || echo "")

if [ -n “$current_link” ]; then

update_state “previous_release_${server}” “$current_link”

fi

if ! ssh “$deploy_user@$server” ”

ln -sfn ${new_release_path} ${deploy_path}/current

if [ -x ${new_release_path}/deploy/post-deploy.sh ]; then

cd ${new_release_path}

./deploy/post-deploy.sh

fi

sudo systemctl reload nginx || true

sudo systemctl restart app || true

”; then

log “error” “Failed to activate deployment on $server”

return 1

fi

log “info” “Successfully deployed to $server”

return 0

}

rollback_server() {

local server=$1

local deploy_user=${2:-“deploy”}

local deploy_path=${3:-“/var/www/app”}

local previous_release=$(get_state “previous_release_${server}”)

if [ -z “$previous_release” ]; then

log “error” “No previous release found for $server”

return 1

fi

log “warning” “Rolling back $server to $previous_release”

ssh “$deploy_user@$server” ”

ln -sfn $previous_release ${deploy_path}/current

sudo systemctl reload nginx || true

sudo systemctl restart app || true

}

deploy_blue_green() {

local artifact=$1

local config=$2

local blue_servers=($(echo “$config” | jq -r ‘.blue_servers[]’))

local green_servers=($(echo “$config” | jq -r ‘.green_servers[]’))

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”‘)

Determine which environment is currently live

Section titled “Determine which environment is currently live”

local live_env=$(get_state “live_environment”)

local target_env

local target_servers

if [ “$live_env” = “blue” ]; then

target_env=“green”

target_servers=(”${green_servers[@]}”)

else

target_env=“blue”

target_servers=(”${blue_servers[@]}”)

fi

log “info” “Starting blue-green deployment to $target_env environment”

update_state “deployment_id” ”$(date +%s)”

update_state “deployment_status” “in_progress”

local failed_servers=()

for server in ”${target_servers[@]}”; do

if ! deploy_to_server “$server” “$artifact”; then

failed_servers+=(“$server”)

fi

done

if [ ${#failed_servers[@]} -gt 0 ]; then

log “error” “Deployment failed on servers: ${failed_servers[*]}”

update_state “deployment_status” “failed”

return 1

fi

log “info” “Running health checks on $target_env environment”

for server in ”${target_servers[@]}”; do

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Health check failed for $server”

update_state “deployment_status” “failed”

Rollback is not needed in blue-green as we haven’t switched traffic

Section titled “Rollback is not needed in blue-green as we haven’t switched traffic”

return 1

fi

done

log “info” “Switching traffic to $target_env environment”

for server in ”${target_servers[@]}”; do

update_load_balancer “enable” “$server”

done

local old_servers

if [ “$target_env” = “blue” ]; then

old_servers=(”${green_servers[@]}”)

else

old_servers=(”${blue_servers[@]}”)

fi

for server in ”${old_servers[@]}”; do

update_load_balancer “drain” “$server”

done

for server in ”${old_servers[@]}”; do

wait_for_drain “$server”

update_load_balancer “maint” “$server”

done

update_state “live_environment” “$target_env”

update_state “deployment_status” “completed”

update_state “deployment_end” ”$(date +%s)”

log “info” “Blue-green deployment completed successfully”

}

deploy_rolling() {

local artifact=$1

local config=$2

local servers=($(echo “$config” | jq -r ‘.servers[]’))

local batch_size=$(echo “$config” | jq -r ‘.batch_size // 1’)

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”’)

local pause_between_batches=$(echo “$config” | jq -r ‘.pause_seconds // 30’)

log “info” “Starting rolling deployment (batch size: $batch_size)”

update_state “deployment_id” ”$(date +%s)”

update_state “deployment_status” “in_progress”

local total_servers=${#servers[@]}

local deployed=0

while [ $deployed -lt $total_servers ]; do

local batch_servers=()

local batch_end=$((deployed + batch_size))

for ((i=deployed; i<batch_end && i<total_servers; i++)); do

batch_servers+=(”${servers[$i]}”)

done

log “info” “Processing batch: ${batch_servers[*]}“

for server in ”${batch_servers[@]}”; do

update_load_balancer “drain” “$server”

done

for server in ”${batch_servers[@]}”; do

wait_for_drain “$server”

done

local batch_failed=false

for server in ”${batch_servers[@]}”; do

if ! deploy_to_server “$server” “$artifact”; then

log “error” “Deployment failed on $server”

batch_failed=true

break

fi

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Health check failed for $server”

batch_failed=true

break

fi

update_load_balancer “ready” “$server”

done

if [ “$batch_failed” = true ]; then

log “error” “Batch deployment failed, initiating rollback”

for ((i=0; i<deployed+${#batch_servers[@]}; i++)); do

rollback_server ”${servers[$i]}”

update_load_balancer “ready” ”${servers[$i]}”

done

update_state “deployment_status” “failed_rollback_completed”

return 1

fi

deployed=$((deployed + ${#batch_servers[@]}))

Pause between batches if not the last batch

Section titled “Pause between batches if not the last batch”

if [ $deployed -lt $total_servers ]; then

log “info” “Pausing $pause_between_batches seconds before next batch”

sleep “$pause_between_batches”

fi

done

update_state “deployment_status” “completed”

update_state “deployment_end” ”$(date +%s)”

log “info” “Rolling deployment completed successfully”

}

deploy_canary() {

local artifact=$1

local config=$2

local servers=($(echo “$config” | jq -r ‘.servers[]’))

local canary_duration=$(echo “$config” | jq -r ‘.canary_duration // 300’)

local success_rate_threshold=$(echo “$config” | jq -r ‘.success_rate_threshold // 99.5’)

local health_endpoint=$(echo “$config” | jq -r ‘.health_check.endpoint // “/health”‘)

local total_servers=${#servers[@]}

local canary_count=$((total_servers * CANARY_PERCENTAGE / 100))

[ $canary_count -lt 1 ] && canary_count=1

log “info” “Starting canary deployment ($canary_count of $total_servers servers)“

local canary_servers=(”${servers[@]:0:$canary_count}”)

for server in ”${canary_servers[@]}”; do

update_load_balancer “drain” “$server”

wait_for_drain “$server”

if ! deploy_to_server “$server” “$artifact”; then

log “error” “Canary deployment failed on $server”

rollback_server “$server”

update_load_balancer “ready” “$server”

return 1

fi

if ! perform_health_check “$server” “$health_endpoint”; then

log “error” “Canary health check failed for $server”

rollback_server “$server”

update_load_balancer “ready” “$server”

return 1

fi

update_load_balancer “ready” “$server”

done

log “info” “Monitoring canary servers for $canary_duration seconds”

local start_time=$(date +%s)

local end_time=$((start_time + canary_duration))

while [ $(date +%s) -lt $end_time ]; do

local error_rate=$(calculate_error_rate ”${canary_servers[@]}”)

local success_rate=$(echo “100 - $error_rate” | bc)

if (( $(echo “$success_rate < $success_rate_threshold” | bc -l) )); then

log “error” “Canary success rate ($success_rate%) below threshold ($success_rate_threshold%)“

for server in ”${canary_servers[@]}”; do

update_load_balancer “drain” “$server”

done

for server in ”${canary_servers[@]}”; do

wait_for_drain “$server”

rollback_server “$server”

update_load_balancer “ready” “$server”

done

return 1

fi

sleep 10

done

log “info” “Canary phase successful, proceeding with full deployment”

Deploy to remaining servers using rolling strategy

Section titled “Deploy to remaining servers using rolling strategy”

local remaining_servers=(”${servers[@]:$canary_count}”)

local remaining_config=$(echo “$config” | jq —argjson servers ”$(printf ‘%s\n’ ”${remaining_servers[@]}” | jq -R . | jq -s .)” ‘.servers = $servers’)

deploy_rolling “$artifact” “$remaining_config”

}

calculate_error_rate() {

local servers=(”$@“)

This would integrate with your metrics system

Section titled “This would integrate with your metrics system”

echo “0.5”

}

log() {

local level=$1

shift

echo ”[$(date ’+%Y-%m-%d %H:%M:%S’)] [$level] $*” | tee -a /var/log/deploy.log

}

main() {

local artifact=$1

local strategy=${2:-“rolling”}

if [ ! -f “$artifact” ]; then

log “error” “Artifact not found: $artifact”

exit 1

fi

if [ ! -f “$CONFIG_FILE” ]; then

log “error” “Configuration not found: $CONFIG_FILE”

exit 1

fi

local config=$(cat “$CONFIG_FILE”)

if [ -z ”${STRATEGIES[$strategy]}” ]; then

log “error” “Unknown deployment strategy: $strategy”

exit 1

fi

${STRATEGIES[$strategy]} “$artifact” “$config”

}

if [ ”${BASH_SOURCE[0]}” = ”${0}” ]; then

main ”$@”

fi

Q5: Create a disaster recovery automation system that handles failover between regions

This demonstrates complex orchestration and state management:

#!/bin/bash

Multi-region disaster recovery automation system

Section titled “Multi-region disaster recovery automation system”

Handles database replication, DNS failover, and application state synchronization

Section titled “Handles database replication, DNS failover, and application state synchronization”

readonly DR_CONFIG=“/etc/disaster-recovery/config.json”

readonly STATE_DIR=“/var/lib/dr-automation”

readonly HEALTH_CHECK_INTERVAL=30

readonly FAILOVER_THRESHOLD=3  # Consecutive failures before failover

readonly RPO_WARNING_THRESHOLD=300  # 5 minutes

mkdir -p “$STATE_DIR”

declare -A REGION_HEALTH

declare -A REGION_FAILURES

declare -A LAST_REPLICATION_LAG

load_config() {

if [ ! -f “$DR_CONFIG” ]; then

log “error” “DR configuration not found”

exit 1

fi

Export configuration as environment variables

Section titled “Export configuration as environment variables”

eval ”$(jq -r ’

.regions[] |

“REGION_(.name | ascii_upcase)_ENDPOINT=“(.endpoint)"",

“REGION_(.name | ascii_upcase)_DB_HOST=“(.database.host)"",

“REGION_(.name | ascii_upcase)_DB_PORT=“(.database.port // 5432)""

’ “$DR_CONFIG”)“

REGIONS=($(jq -r ‘.regions[].name’ “$DR_CONFIG”))

PRIMARY_REGION=$(jq -r ‘.primary_region’ “$DR_CONFIG”)

}

check_region_health() {

local region=$1

local endpoint_var=“REGION_${region^^}_ENDPOINT”

local endpoint=”${!endpoint_var}”

log “debug” “Checking health of region: $region”

local app_health=false

local http_status=$(curl -s -o /dev/null -w ”%{http_code}” \

—connect-timeout 5 —max-time 10 \

“https://$endpoint/health” || echo “000”)

if [ “$http_status” = “200” ]; then

app_health=true

fi

local db_health=false

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_port_var=“REGION_${region^^}_DB_PORT”

local db_host=”${!db_host_var}”

local db_port=”${!db_port_var}”

if pg_isready -h “$db_host” -p “$db_port” -t 5 >/dev/null 2>&1; then

db_health=true

fi

local replication_ok=true

if [ “$region” != “$PRIMARY_REGION” ]; then

local lag=$(check_replication_lag “$region”)

LAST_REPLICATION_LAG[$region]=$lag

if [ “$lag” -gt “$RPO_WARNING_THRESHOLD” ]; then

log “warning” “High replication lag for $region: ${lag}s”

replication_ok=false

fi

fi

if [ “$app_health” = true ] && [ “$db_health” = true ] && [ “$replication_ok” = true ]; then

REGION_HEALTH[$region]=“healthy”

REGION_FAILURES[$region]=0

return 0

else

REGION_HEALTH[$region]=“unhealthy”

REGION_FAILURES[$region]=$((${REGION_FAILURES[$region]:-0} + 1))

log “error” “Region $region health check failed - App: $app_health, DB: $db_health, Replication: $replication_ok”

return 1

fi

}

check_replication_lag() {

local region=$1

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_host=”${!db_host_var}“

local lag=$(psql -h “$db_host” -U postgres -t -c ”

SELECT EXTRACT(EPOCH FROM (now() - pg_last_xact_replay_timestamp()))::int

WHERE pg_is_in_recovery();” 2>/dev/null | tr -d ’ ’)

if [ -z “$lag” ] || [ “$lag” = “NULL” ]; then

echo “0”

else

echo “$lag”

fi

}

perform_dns_failover() {

local from_region=$1

local to_region=$2

local domain=$(jq -r ‘.domain’ “$DR_CONFIG”)

local dns_provider=$(jq -r ‘.dns_provider’ “$DR_CONFIG”)

log “warning” “Initiating DNS failover from $from_region to $to_region”

case “$dns_provider” in

route53)

local hosted_zone=$(aws route53 list-hosted-zones-by-name \

—query “HostedZones[?Name==’${domain}.’].Id” \

—output text | cut -d’/’ -f3)

local to_endpoint_var=“REGION_${to_region^^}_ENDPOINT”

local to_endpoint=”${!to_endpoint_var}“

aws route53 change-resource-record-sets \

—hosted-zone-id “$hosted_zone” \

—change-batch ”{

“Changes”: [{

“Action”: “UPSERT”,

“ResourceRecordSet”: {

“Name”: ”${domain}”,

“Type”: “CNAME”,

“TTL”: 60,

“ResourceRecords”: [{“Value”: ”${to_endpoint}”}]

}

}]

}”

;;

cloudflare)

local zone_id=$(curl -s -X GET “https://api.cloudflare.com/client/v4/zones?name=$domain” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” | \

jq -r ‘.result[0].id’)

local record_id=$(curl -s -X GET \

https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records?name=$domain” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” | \

jq -r ‘.result[0].id’)

local to_endpoint_var=“REGION_${to_region^^}_ENDPOINT”

local to_endpoint=”${!to_endpoint_var}”

curl -X PUT “https://api.cloudflare.com/client/v4/zones/$zone_id/dns_records/$record_id” \

-H “Authorization: Bearer $CLOUDFLARE_API_TOKEN” \

-H “Content-Type: application/json” \

—data ”{“type”:“CNAME”,“name”:“$domain”,“content”:“$to_endpoint”,“ttl”:60}”

;;

esac

log “info” “DNS failover initiated, propagation may take up to 60 seconds”

}

promote_database() {

local region=$1

local db_host_var=“REGION_${region^^}_DB_HOST”

local db_host=”${!db_host_var}”

log “warning” “Promoting database in region $region to primary”

ssh “postgres@$db_host” “pg_ctl promote -D /var/lib/postgresql/data”

Update application configuration to point to new primary

Section titled “Update application configuration to point to new primary”

local app_servers=($(jq -r “.regions[] | select(.name==“$region”) | .app_servers[]” “$DR_CONFIG”))

for server in ”${app_servers[@]}”; do

ssh “deploy@$server” ”

sed -i ‘s/^DB_HOST=.*/DB_HOST=$db_host/’ /etc/app/config

sudo systemctl restart app

done

}

sync_application_state() {

local from_region=$1

local to_region=$2

log “info” “Syncing application state from $from_region to $to_region”

local from_redis=$(jq -r “.regions[] | select(.name==“$from_region”) | .redis.host” “$DR_CONFIG”)

local to_redis=$(jq -r “.regions[] | select(.name==“$to_region”) | .redis.host” “$DR_CONFIG”)

ssh “redis@$from_redis” “redis-cli BGSAVE && sleep 2”

ssh “redis@$from_redis” “cat /var/lib/redis/dump.rdb” | \

ssh “redis@$to_redis” “cat > /var/lib/redis/dump.rdb && redis-cli FLUSHALL && redis-server —loadmodule”

Sync session data if using file-based sessions

Section titled “Sync session data if using file-based sessions”

local from_servers=($(jq -r “.regions[] | select(.name==“$from_region”) | .app_servers[]” “$DR_CONFIG”))

local to_servers=($(jq -r “.regions[] | select(.name==“$to_region”) | .app_servers[]” “$DR_CONFIG”))

printf ‘%s\n’ ”${from_servers[@]}” | parallel -j 4 ”

rsync -av

Bash Shell Scripting Interview Questions And Answers

Whether you’re just starting out or have years of experience, bash shell interview questions test your understanding of the shell’s unique features, syntax quirks, and best practices that make bash the go-to choice for system automation. We’ll cover the essential bash-specific concepts that interviewers love to explore - from array manipulation and string operations to process substitution and advanced parameter expansion. These questions focus on bash’s distinctive capabilities that set it apart from basic POSIX shell scripting.

Q1: What’s the difference between [ ] and [[ ]] in bash?

This is a classic that trips up many people. The single bracket [ ] is actually a command (same as test), while double brackets [[ ]] are a bash keyword with enhanced functionality:

[[ ]] supports pattern matching: [[ $string == pa* ]] works, but [ $string == pa* ] doesn’t

[[ ]] handles empty variables better: [[ -z $empty ]] works fine, while [ -z $empty ] needs quotes

[[ ]] supports regex matching with =~ operator

[[ ]] allows && and || inside: [[ $a -eq 1 && $b -eq 2 ]]

The key insight: always use [[ ]] in bash unless you need POSIX compatibility. It’s safer and more powerful.

Q2: Explain bash arrays and how they differ from regular variables

Bash supports both indexed and associative arrays, which many people don’t fully utilize:

Indexed arrays:

arr=(apple banana cherry)

echo ${arr[0]}  # apple

echo ${arr[@]}  # all elements

echo ${#arr[@]} # array length (3)

Associative arrays (bash 4+):

declare -A hash

hash[name]=“John”

hash[age]=30

echo ${hash[name]}  # John

The tricky part is that bash arrays are sparse - you can have arr[0] and arr[10] without arr[1-9]. Also, array indices start at 0, not 1 like in some shells.

Q3: How does command substitution work and what’s the difference between “ and $()?

Both capture command output, but $() is strongly preferred:

$() nests easily: echo $(echo $(date))

Backticks require escaping for nesting: echo echo \date\ “

$() is clearer to read, especially with syntax highlighting

$() handles quotes more predictably

Example showing the difference:

result=$(echo “hello $(echo “world”)“)

result=echo "hello \echo “world”`”`

Q4: What are bash parameter expansions and give some advanced examples?

Parameter expansion is bash’s Swiss Army knife for string manipulation:

${var:-default} - Use default if var is unset/null

${var:=default} - Set var to default if unset/null

${var:?error} - Exit with error if var is unset/null

${var:+alternate} - Use alternate if var is set

Advanced string manipulation:

path=“/home/user/documents/file.txt”

echo ${path##*/}  # file.txt (basename)

echo ${path%/*}   # /home/user/documents (dirname)

echo ${path/documents/archive}  # String replacement

echo ${path^^}    # Convert to uppercase

echo ${path,,}    # Convert to lowercase

The pattern: # removes from beginning, % from end. Double it (##, %%) for greedy matching.

Q5: Explain the different types of expansions in bash and their order

Bash performs expansions in a specific order, and understanding this prevents many bugs:

Brace expansion: {a,b,c} or {1..10}

Tilde expansion: ~ becomes home directory

Parameter/variable expansion: $var or ${var}

Command substitution: $(command) or command

Arithmetic expansion: $((2 + 2))

Word splitting: Based on IFS

Pathname expansion: *.txt (globbing)

Why this matters:

var=“*.txt”

ls $var  # Word splitting happens after parameter expansion

ls *.txt  # Pathname expansion happens directly

Q6: What is process substitution and when would you use it?

Process substitution <(command) creates a temporary file descriptor that acts like a file:

diff <(ls dir1) <(ls dir2)

Read from multiple commands simultaneously

Section titled “Read from multiple commands simultaneously”

while read -u 3 line1 && read -u 4 line2; do

echo “File1: $line1 | File2: $line2”

done 3< <(cat file1) 4< <(cat file2)

grep “pattern” <(curl -s https://example.com)

It’s incredibly useful when you need to treat command output as a file without creating temporary files. The key limitation: it’s not POSIX, so it’s bash/zsh specific.

Q7: How do you handle signals and traps in bash?

Traps let you intercept signals and run cleanup code:

trap ‘echo “Ctrl+C pressed”’ INT

trap ‘cleanup_function’ EXIT

trap ‘echo “Error on line $LINENO”’ ERR

trap - INT

trap -l

cleanup() {

rm -f /tmp/tempfile.$$

echo “Cleaned up”

}

trap cleanup EXIT INT TERM

Pro tip: Always use single quotes in trap commands to prevent premature expansion. The ERR trap is especially useful with set -e for debugging.

Q8: Explain bash’s set options and their impact on script behavior

The set command changes bash’s behavior fundamentally:

set -e (errexit): Exit on any command failure

set -u (nounset): Exit on undefined variable

set -o pipefail: Pipe fails if any command fails

set -x (xtrace): Print commands before execution

Best practice combo:

set -euo pipefail  # Strict mode

set -E  # ERR trap inherits to functions

set -T  # DEBUG/RETURN traps inherit to functions

You can also use set +e to temporarily disable, and $- contains current options. The gotcha: set -e doesn’t work in certain contexts like if conditions.

Q9: What are coprocesses in bash and how do you use them?

Coprocesses (bash 4+) let you run background processes with two-way communication:

coproc BC { bc; }

echo “2+2” >&${BC[1]}

read result <&${BC[0]}

echo $result  # 4

coproc GREP { grep —line-buffered “error”; }

echo “this is fine” >&${GREP[1]}

echo “error occurred” >&${GREP[1]}

read -t 1 line <&${GREP[0]} && echo “Got: $line”

Coprocesses are useful for maintaining persistent connections to programs like bc, awk, or database clients without the overhead of starting new processes.

Q10: How does bash handle arithmetic and what are the different ways to perform calculations?

Bash offers multiple arithmetic methods, each with different capabilities:

result=$((5 + 3))

((count++))  # C-style increment

let “result = 5 + 3”

result=$(expr 5 + 3)

result=$(echo “scale=2; 5/3” | bc)

if ((5 > 3)); then echo “yes”; fi

for ((i=0; i<10; i++)); do

echo $i

done

Bash only supports integer arithmetic natively. For floating-point, use bc or awk. The ((…)) construct returns 0 for true, opposite of normal bash commands.

Q11: What’s the difference between source, ., and executing a script?

These three methods have crucial differences:

./script.sh - Runs in a subshell, can’t modify parent environment

source script.sh or . script.sh - Runs in current shell, can modify environment

exec script.sh - Replaces current shell process entirely

Example showing the difference:

export MY_VAR=“hello”

./script.sh

echo $MY_VAR  # Empty

source script.sh

echo $MY_VAR  # hello

exec script.sh

echo “Never printed”

Use source for configuration files, ./ for independent scripts, and exec for wrapper scripts.

Q12: How do you properly handle spaces and special characters in bash?

This is where most scripts break. Key strategies:

bad:  rm $file

good: rm “$file”

files=(“file 1.txt” “file 2.txt”)

for f in ”${files[@]}”; do

cat “$f”

done

while IFS= read -r line; do

echo “$line”

done < file.txt

find . -name “*.txt” -print0 | while IFS= read -r -d ” file; do

echo “Processing: $file”

done

cmd=(ls -la “/path with spaces/”)

”${cmd[@]}”  # Executes correctly

The golden rule: If it’s not a numeric value you’re absolutely sure about, quote it. Use ”$@” not $* for arguments, and always use -r with read to prevent backslash interpretation.