Text Processing in Bash for DevOps
In the previous article we handled files: reading, writing, redirection, tee, find, and xargs. But text files become truly useful when you know how to filter, extract, and transform data inside them.
In DevOps, grep, awk, and sed show up everywhere: filtering error logs, extracting status codes from access logs, counting the most active IP addresses, changing config before deployment, or creating quick reports from command output. This article walks through the three tools in a practical way, with examples close to daily operations work.
grep: find lines that match a pattern
grep reads input and prints lines that match a pattern. It is often the first tool you reach for when you need to find information quickly in logs or config files.
| |
Common options:
| |
According to GNU grep, -E uses extended regular expressions, while -F uses fixed strings. When the pattern is a simple literal string, grep -F helps avoid surprises from characters like ., [, or * being interpreted as regex.
grep context: view lines before and after an error
When debugging logs, a single error line is often not enough. GNU grep supports context options:
| |
Example: inspect a deploy error with surrounding context:
| |
When multiple match groups are far apart, grep inserts -- by default to separate context groups. This is useful in the terminal, but if the output is passed to another script, remember that this separator line may appear.
grep in a DevOps pipeline
Example: filter an application log, exclude healthchecks, and show only severe errors:
| |
Because grep returns exit code 1 when there is no match, a pipeline may stop the script if you are using set -e. In a case where “no errors found” is normal, adding || true at the end of the pipeline is acceptable. For more important pipelines, handle the exit code explicitly with if grep ...; then ... fi.
awk: process text by line and column
awk reads input as records; by default, each line is one record. Each line is split into fields: $1, $2, $3, and so on. The full line is $0.
| |
Important built-in variables:
NR: number of records read so far.NF: number of fields in the current record.$0: the full current line.$1,$2, …: the first field, second field, and so on.FS: input field separator.OFS: output field separator.
Example: print line number, field count, and the full line:
| |
According to the GNU awk manual, NR increments whenever awk reads a new record, while NF is the number of fields in the current record. These two variables are very useful when you need to validate text data quickly.
awk BEGIN, END, and conditions
BEGIN runs before input is read. END runs after all input has been read. The middle part is a rule applied to each line.
| |
A more readable example: count HTTP status codes from a common Nginx access log format:
| |
In a common access log, $1 is often the client IP, $7 is the path, and $9 is the HTTP status code. However, log formats can differ depending on the Nginx or Apache configuration, so always inspect a few sample lines before hardcoding field positions.
awk with a custom delimiter
By default, awk splits fields on whitespace. For key-value files or simple CSV-like data, use -F to choose a delimiter.
| |
Example: read a .env file and skip comments or blank lines:
| |
For complex CSV with quotes, commas inside fields, or escaping, awk -F, is not safe enough. In that case, use a real CSV parser in Python, Go, or a dedicated tool.
sed: replace and edit text line by line
sed is a stream editor: it reads input, applies a script, and writes output. Its most famous command is substitute:
| |
According to the GNU sed manual, the basic syntax is s/regexp/replacement/flags. The g flag replaces all matches on a line instead of only the first match.
Example: change an endpoint in a config file and write the result to a new file:
| |
Here we use | as the delimiter instead of / to avoid escaping many / characters in the URL.
sed addresses and in-place editing
You can limit sed so it only changes lines matching a pattern or a range.
| |
GNU sed supports -i for in-place editing. If you provide a suffix, sed creates a backup before renaming the temporary file back to the original file:
| |
Be careful with sed -i:
- Always test without
-ifirst so you can inspect the output. - Use a suffix such as
.bakwhen editing important files. - Differences between GNU sed and BSD/macOS sed can make
sed -iscripts less portable.
DevOps practice: parse Nginx logs
Assume the access log uses a common format:
| |
Filter 4xx/5xx errors with awk:
| |
Count the top IP addresses by request count:
| |
Count status codes:
| |
If you want to exclude healthcheck requests from the statistics:
| |
DevOps practice: update config before deployment
Example script: update LOG_LEVEL and FEATURE_FLAG in a .env file, with a backup before editing.
| |
This script demonstrates how the tools work together:
cpcreates a clear backup before editing.sedreplaces values when the keys already exist.grep -qchecks whether each key exists.>>appends missing keys.
Note: do not put real secrets in examples or commits. For secrets, prefer environment variables, a secret manager, or your CI/CD secret store.
DevOps practice: generate an error report from logs
Example: create a short report with the total number of errors and the top endpoints returning 5xx:
| |
This kind of script is useful to run from cron or a CI job after a load test, as long as you understand the input log format clearly.
Common mistakes
- Using regex when a literal string is enough: If the pattern contains special characters such as
[,., or*, considergrep -F. - Forgetting grep exit codes: No match means exit code
1, which is not necessarily a business-level error. - Hardcoding log fields without checking the format:
$9is the status code in a common format, but not every log format is the same. - Using
awk -F,for complex CSV: CSV with quotes or commas inside fields needs a real parser. - Running
sed -idirectly on important files: Test the output first, use a backup suffix, or copy the file before editing. - Not quoting variables in scripts: When passing file paths to
grep,awk, orsed, always quote"${FILE}".
Implementation notes
- When applying this to your own project, choose the tool based on the goal:
- Find/filter lines →
grep. - Extract columns, calculate totals, group counts →
awk. - Replace text by line or pattern →
sed.
- Find/filter lines →
- Best practices:
- Use
grep -nwhen debugging so you can see line numbers. - Use
grep -Cwhen you need context around an error. - Use
awkwithBEGIN/ENDto create reports with headers or footers. - Use a different delimiter in
sedwhen processing URLs or paths, for examples|old|new|g. - For config edits, back up first and verify afterward with
grepor the service’s config test command.
- Use
- Troubleshooting:
- Pipeline stops because
grepfound no match? → Handle the exit code withif grep ...or|| truewhen appropriate. awkprints the wrong column? → Tryawk '{ print NR, NF, $0 }'to inspect fields.sedchanges nothing? → Check whether the pattern anchor is correct and whether the delimiter needs escaping.
- Pipeline stops because
🎯 Conclusion
grep, awk, and sed are the core trio that turns Bash into a fast log and config processing tool. grep helps you find the right lines, awk helps you analyze columns and summarize numbers, and sed helps you edit text in a controlled way. Combined with the file-handling skills from the previous article, you can already build many small but useful DevOps scripts.
In the next article, we will cover functions in Bash: splitting logic into functions, passing arguments, using local, handling return codes, and building a small reusable logging library. 🚀
References
- GNU Grep Manual — Official documentation for
grep, regex,-E,-F,-n,-A,-B, and-C. - GNU Awk Manual — Official documentation for fields,
NR,NF,BEGIN,END, and awk programs. - GNU Sed Manual — Official documentation for
s/regexp/replacement/flags, addresses, and-i. - GNU Coreutils Manual — sort — Additional reference for statistical pipelines with
sort.
