Featured image of post Text Processing in Bash: grep, awk, and sed for DevOps

Text Processing in Bash: grep, awk, and sed for DevOps

Learn text processing in Bash with grep, awk, and sed. Includes DevOps examples for filtering Nginx logs, counting unique IPs, and safely updating config files.

Text Processing in Bash for DevOps

In the previous article we handled files: reading, writing, redirection, tee, find, and xargs. But text files become truly useful when you know how to filter, extract, and transform data inside them.

In DevOps, grep, awk, and sed show up everywhere: filtering error logs, extracting status codes from access logs, counting the most active IP addresses, changing config before deployment, or creating quick reports from command output. This article walks through the three tools in a practical way, with examples close to daily operations work.


grep: find lines that match a pattern

grep reads input and prints lines that match a pattern. It is often the first tool you reach for when you need to find information quickly in logs or config files.

1
2
grep "ERROR" app.log
grep "server_name" /etc/nginx/conf.d/*.conf

Common options:

1
2
3
4
5
6
grep -n "ERROR" app.log          # print line numbers
grep -i "timeout" app.log        # ignore case
grep -v "healthcheck" app.log    # exclude matching lines
grep -r "DATABASE_URL" ./config  # search recursively in a directory
grep -E "ERROR|WARN" app.log     # extended regex
grep -F "[literal]" app.log      # fixed string, not regex

According to GNU grep, -E uses extended regular expressions, while -F uses fixed strings. When the pattern is a simple literal string, grep -F helps avoid surprises from characters like ., [, or * being interpreted as regex.


grep context: view lines before and after an error

When debugging logs, a single error line is often not enough. GNU grep supports context options:

1
2
3
grep -A 3 "ERROR" app.log  # 3 lines after the match
grep -B 3 "ERROR" app.log  # 3 lines before the match
grep -C 3 "ERROR" app.log  # 3 lines before and after the match

Example: inspect a deploy error with surrounding context:

1
grep -n -C 5 "deploy failed" ./logs/deploy.log

When multiple match groups are far apart, grep inserts -- by default to separate context groups. This is useful in the terminal, but if the output is passed to another script, remember that this separator line may appear.


grep in a DevOps pipeline

Example: filter an application log, exclude healthchecks, and show only severe errors:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
#!/usr/bin/env bash
set -euo pipefail

LOG_FILE="${1:-./logs/app.log}"

if [[ ! -r "${LOG_FILE}" ]]; then
  echo "ERROR: Cannot read log file: ${LOG_FILE}"
  exit 1
fi

grep -E "ERROR|FATAL" "${LOG_FILE}" | grep -v "healthcheck" || true

Because grep returns exit code 1 when there is no match, a pipeline may stop the script if you are using set -e. In a case where “no errors found” is normal, adding || true at the end of the pipeline is acceptable. For more important pipelines, handle the exit code explicitly with if grep ...; then ... fi.


awk: process text by line and column

awk reads input as records; by default, each line is one record. Each line is split into fields: $1, $2, $3, and so on. The full line is $0.

1
2
awk '{ print $1 }' access.log
awk '{ print $1, $9 }' access.log

Important built-in variables:

  • NR: number of records read so far.
  • NF: number of fields in the current record.
  • $0: the full current line.
  • $1, $2, …: the first field, second field, and so on.
  • FS: input field separator.
  • OFS: output field separator.

Example: print line number, field count, and the full line:

1
awk '{ print NR, NF, $0 }' app.log

According to the GNU awk manual, NR increments whenever awk reads a new record, while NF is the number of fields in the current record. These two variables are very useful when you need to validate text data quickly.


awk BEGIN, END, and conditions

BEGIN runs before input is read. END runs after all input has been read. The middle part is a rule applied to each line.

1
awk 'BEGIN { print "status,count" } { count[$9]++ } END { for (code in count) print code "," count[code] }' access.log

A more readable example: count HTTP status codes from a common Nginx access log format:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
awk '
  BEGIN {
    print "status,count"
  }
  {
    status[$9]++
  }
  END {
    for (code in status) {
      print code "," status[code]
    }
  }
' access.log

In a common access log, $1 is often the client IP, $7 is the path, and $9 is the HTTP status code. However, log formats can differ depending on the Nginx or Apache configuration, so always inspect a few sample lines before hardcoding field positions.


awk with a custom delimiter

By default, awk splits fields on whitespace. For key-value files or simple CSV-like data, use -F to choose a delimiter.

1
2
awk -F= '{ print $1, $2 }' app.env
awk -F: '{ print $1, $7 }' /etc/passwd

Example: read a .env file and skip comments or blank lines:

1
2
3
4
5
6
7
awk -F= '
  /^[[:space:]]*#/ { next }
  NF < 2 { next }
  {
    print "key=" $1
  }
' .env

For complex CSV with quotes, commas inside fields, or escaping, awk -F, is not safe enough. In that case, use a real CSV parser in Python, Go, or a dedicated tool.


sed: replace and edit text line by line

sed is a stream editor: it reads input, applies a script, and writes output. Its most famous command is substitute:

1
2
sed 's/old/new/' file.txt      # replace the first match on each line
sed 's/old/new/g' file.txt     # replace all matches on each line

According to the GNU sed manual, the basic syntax is s/regexp/replacement/flags. The g flag replaces all matches on a line instead of only the first match.

Example: change an endpoint in a config file and write the result to a new file:

1
sed 's|http://localhost:8080|https://api.example.com|g' app.conf > app.conf.new

Here we use | as the delimiter instead of / to avoid escaping many / characters in the URL.


sed addresses and in-place editing

You can limit sed so it only changes lines matching a pattern or a range.

1
2
sed '/^LOG_LEVEL=/s/=.*$/=debug/' app.env
sed '10,20s/enabled=false/enabled=true/' feature.conf

GNU sed supports -i for in-place editing. If you provide a suffix, sed creates a backup before renaming the temporary file back to the original file:

1
sed -i.bak 's/^LOG_LEVEL=.*/LOG_LEVEL=info/' app.env

Be careful with sed -i:

  • Always test without -i first so you can inspect the output.
  • Use a suffix such as .bak when editing important files.
  • Differences between GNU sed and BSD/macOS sed can make sed -i scripts less portable.

DevOps practice: parse Nginx logs

Assume the access log uses a common format:

1
2
3
203.0.113.10 - - [08/Jun/2026:10:12:01 +0700] "GET /api/health HTTP/1.1" 200 12
198.51.100.23 - - [08/Jun/2026:10:12:02 +0700] "POST /api/login HTTP/1.1" 401 64
203.0.113.10 - - [08/Jun/2026:10:12:03 +0700] "GET /api/users HTTP/1.1" 200 532

Filter 4xx/5xx errors with awk:

1
awk '$9 ~ /^[45][0-9][0-9]$/ { print $1, $7, $9 }' access.log

Count the top IP addresses by request count:

1
2
3
awk '{ count[$1]++ } END { for (ip in count) print count[ip], ip }' access.log \
  | sort -nr \
  | head -n 10

Count status codes:

1
2
awk '{ status[$9]++ } END { for (code in status) print code, status[code] }' access.log \
  | sort -n

If you want to exclude healthcheck requests from the statistics:

1
2
3
grep -v '"GET /api/health ' access.log \
  | awk '{ status[$9]++ } END { for (code in status) print code, status[code] }' \
  | sort -n

DevOps practice: update config before deployment

Example script: update LOG_LEVEL and FEATURE_FLAG in a .env file, with a backup before editing.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/env bash
set -euo pipefail

ENV_FILE="${1:-./app.env}"
LOG_LEVEL="${LOG_LEVEL:-info}"
FEATURE_FLAG="${FEATURE_FLAG:-false}"

if [[ ! -f "${ENV_FILE}" ]]; then
  echo "ERROR: env file not found: ${ENV_FILE}"
  exit 1
fi

if [[ ! -w "${ENV_FILE}" ]]; then
  echo "ERROR: env file is not writable: ${ENV_FILE}"
  exit 1
fi

cp -- "${ENV_FILE}" "${ENV_FILE}.bak.$(date +%Y%m%d%H%M%S)"

sed -i.bak \
  -e "s/^LOG_LEVEL=.*/LOG_LEVEL=${LOG_LEVEL}/" \
  -e "s/^FEATURE_FLAG=.*/FEATURE_FLAG=${FEATURE_FLAG}/" \
  "${ENV_FILE}"

if ! grep -q '^LOG_LEVEL=' "${ENV_FILE}"; then
  echo "LOG_LEVEL=${LOG_LEVEL}" >> "${ENV_FILE}"
fi

if ! grep -q '^FEATURE_FLAG=' "${ENV_FILE}"; then
  echo "FEATURE_FLAG=${FEATURE_FLAG}" >> "${ENV_FILE}"
fi

This script demonstrates how the tools work together:

  • cp creates a clear backup before editing.
  • sed replaces values when the keys already exist.
  • grep -q checks whether each key exists.
  • >> appends missing keys.

Note: do not put real secrets in examples or commits. For secrets, prefer environment variables, a secret manager, or your CI/CD secret store.


DevOps practice: generate an error report from logs

Example: create a short report with the total number of errors and the top endpoints returning 5xx:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#!/usr/bin/env bash
set -euo pipefail

ACCESS_LOG="${1:-./access.log}"
REPORT_FILE="${2:-./error-report.txt}"

if [[ ! -r "${ACCESS_LOG}" ]]; then
  echo "ERROR: Cannot read access log: ${ACCESS_LOG}"
  exit 1
fi

{
  echo "Error report generated at $(date -Is)"
  echo
  echo "Total 5xx responses:"
  awk '$9 ~ /^5[0-9][0-9]$/ { total++ } END { print total + 0 }' "${ACCESS_LOG}"
  echo
  echo "Top 5 endpoints with 5xx:"
  awk '$9 ~ /^5[0-9][0-9]$/ { count[$7]++ } END { for (path in count) print count[path], path }' "${ACCESS_LOG}" \
    | sort -nr \
    | head -n 5
} > "${REPORT_FILE}"

echo "Wrote ${REPORT_FILE}"

This kind of script is useful to run from cron or a CI job after a load test, as long as you understand the input log format clearly.


Common mistakes

  • Using regex when a literal string is enough: If the pattern contains special characters such as [, ., or *, consider grep -F.
  • Forgetting grep exit codes: No match means exit code 1, which is not necessarily a business-level error.
  • Hardcoding log fields without checking the format: $9 is the status code in a common format, but not every log format is the same.
  • Using awk -F, for complex CSV: CSV with quotes or commas inside fields needs a real parser.
  • Running sed -i directly on important files: Test the output first, use a backup suffix, or copy the file before editing.
  • Not quoting variables in scripts: When passing file paths to grep, awk, or sed, always quote "${FILE}".

Implementation notes

  • When applying this to your own project, choose the tool based on the goal:
    • Find/filter lines → grep.
    • Extract columns, calculate totals, group counts → awk.
    • Replace text by line or pattern → sed.
  • Best practices:
    • Use grep -n when debugging so you can see line numbers.
    • Use grep -C when you need context around an error.
    • Use awk with BEGIN/END to create reports with headers or footers.
    • Use a different delimiter in sed when processing URLs or paths, for example s|old|new|g.
    • For config edits, back up first and verify afterward with grep or the service’s config test command.
  • Troubleshooting:
    • Pipeline stops because grep found no match? → Handle the exit code with if grep ... or || true when appropriate.
    • awk prints the wrong column? → Try awk '{ print NR, NF, $0 }' to inspect fields.
    • sed changes nothing? → Check whether the pattern anchor is correct and whether the delimiter needs escaping.

🎯 Conclusion

grep, awk, and sed are the core trio that turns Bash into a fast log and config processing tool. grep helps you find the right lines, awk helps you analyze columns and summarize numbers, and sed helps you edit text in a controlled way. Combined with the file-handling skills from the previous article, you can already build many small but useful DevOps scripts.

In the next article, we will cover functions in Bash: splitting logic into functions, passing arguments, using local, handling return codes, and building a small reusable logging library. 🚀


References

  • GNU Grep Manual — Official documentation for grep, regex, -E, -F, -n, -A, -B, and -C.
  • GNU Awk Manual — Official documentation for fields, NR, NF, BEGIN, END, and awk programs.
  • GNU Sed Manual — Official documentation for s/regexp/replacement/flags, addresses, and -i.
  • GNU Coreutils Manual — sort — Additional reference for statistical pipelines with sort.