Replicating Bash Argument Splitting
Sometimes it really is just a Small Matter of Programming
December 28, 2021
A few weeks ago I was writing a macro loader for my JSON Processor, and I ran into an odd case where code I entered in the terminal was splitting differently to code I loaded from a file. It turns out that Bash’s word splitting behaves differently for arguments than it does for variables.
When processing args Bash will not split quoted words that contain a delimiter:
printf "%s\n" foo 'bar baz'
foo
bar baz
But if those args are in a string:
input='foo "bar baz"'
printf "%s\n" "$input"
foo "bar baz"
Or unquoted:
printf "%s\n" $input
foo
"bar
baz"
What I really want is to split the variable into an array of words. Then printf works as I want it to:
input=(foo "bar baz")
printf "%s\n" "${input[@]}"
foo
bar baz
I can’t use read -a
because that will split input ignoring quotes. And launching a subshell to another program (awk, xargs) is too slow for my use case.
Bash’s argument processing is hardcoded in its parser: modifying IFS has no effect. This is also a hint for the first solution, just use eval:
input='foo "bar baz"'
eval "printf \"%s\n\" $input"
foo
bar baz
But using eval invites a whole set of complications I’d rather avoid.
Bash Quoting
Bash splits arguments on unescaped spaces, horizontal tabs and newlines. So to determine when to split a word we need to understand Bash escape rules. There are four(ish) kinds:
- Unquoted - tab/space/newline/backslash can be escaped with backslash
- Single quoted - literal string, no escapes recognized
- Double quoted - five characters (
$\
\"\n`) can be escaped with backslash - ANSI-C - single quotes can be escaped and many other backslash sequences are recognized
Splitting words like arguments
Decoding and interpreting ANSI-C escape sequences seems difficult, and I don’t need to support it for my use case. To handle the other three kinds, I need to write code which handles each set of escape rules, scan a string and to keep track which kind of quoting is currently in effect.
Here’s what I came up with:
wordsplit () {
WORDS=()
WORDC=0
WORDERR=
OPTIND=1
local quo= word= esc=
while getopts ":" opt "-$1";do
if [ -z $quo ];then
if [ "$OPTARG" = '\' ]&&[ -z "$esc" ];then
esc=1
continue
elif ([[ "$OPTARG" == [$' \t\n'] ]]&&[ -z "$esc" ]);then
if [ -n "$word" ];then
WORDS+=("$word")
word=
(( WORDC++ ))
fi
continue
elif ([ "$OPTARG" = "'" ]||[ "$OPTARG" = '"' ])&&[ -z "$esc" ];then
quo="$OPTARG"
continue
fi
elif [ "$quo" = '"' ];then
if [ -n "$esc" ];then
! [[ "$OPTARG" == [$'$\\`"\n'] ]] && word+='\'
elif [ "$OPTARG" = '\' ];then
esc=1
continue
elif [ "$OPTARG" = '"' ];then
quo=
continue
fi
elif [ "$OPTARG" = "$quo" ];then # single quote term
quo=
continue
fi
word+="$OPTARG"
esc=
done
if [ -n "$quo" ];then
WORDERR="found unterminated string"
return 1
elif [ -n "$word" ];then
WORDS+=("$word")
(( WORDC++ ))
fi
return 0
}
This function accepts a single string argument which it splits into the WORDS
array (Bash doesn’t really support return values). It inspects the string byte-by-byte using the getopts string split trick. I like this trick because it’s fast and avoids maintaining an index and using substrings.
The first conditional block [ -z $quo ]
handles unquoted strings; all word splitting happens outside of quotes, so this is the longest block.
The second top-level block [ "$quo" = '"' ]
handles double quotes string escapes. One complication here is only five characters can be escaped, so if we’re in escape mode and the current character is not one of those, the code appends a backslash to the current word which would have been ignored when it was seen in the previous iteration.
The last top-level block [ "$OPTARG" = "$quo" ]
just catches the case of a single-quoted string terminator.
Once the loop ends wordsplit
checks for the unterminated string error condition, and also a dangling word.
Here it is in action:
wordsplit ' foo bar\ baz "a b\"c" \d';printf "%s\n" "${WORDS[@]}"
foo
bar baz
a b"c
d
I’ve uploaded the code to GitHub. It has a test suite but as shell quoting is a treacherous business, if you find any bugs please let me know!
Notes
- Greg’s wiki page on Bash quoting has advice and examples (the whole wiki is pretty great).