Code

Bash Function Names Can Be Almost Anything

Freedom > portability?

June 20, 2021

A common misconception about Bash is that function names must follow the same rules that variables do. The Bash manual even suggests this:

A word consisting solely of letters, numbers, and underscores, and beginning with a letter or underscore. Names are used as shell variable and function names. Also referred to as an identifier.

In fact Bash function names can be almost any printable character. For instance I can define my own pre-increment unary function:

function ++ { (( $1++ )); }

The precise rules for function names are murky; mostly it seems like whatever Bash can parse unambiguously is allowed. This script prints all allowable single-character function names:

#!/bin/bash
for n in {33..126}; do
  printf -v oct "%03o" "$n"
  printf -v chr \\"$oct"
  eval "function $chr { echo -n '$chr'; }; $chr" 2>/dev/null
done

Its output:

+,./:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_abcdefghijklmnopqrstuvwxyz

The script uses brace expansion to get all the printable ASCIIĀ¹ decimal codes, and printf to convert the number first to octal, and then to its character. Then it uses eval to try to declare a new function with the name of the character, and calls the function.

The permissible characters in a function name changes if it contains a letter. Here’s the list of permissible second characters when the function name begins with “a”:

!#%+,-./0123456789:?@ABCDEFGHIJKLMNOPQRSTUVWXYZ]^_abcdefghijklmnopqrstuvwxyz{}~

The additional characters all possess some significance to the Bash parser which is disambiguated by the leading “a”. This is also true for the characters missing from the second group, like = which is fine by itself, but presumably a= looks like the start of a variable declaration, and a[ an array access.

And function names aren’t limited to ASCII either. Here’s my smiley function:

function šŸ˜ƒ { echo "haha!"; }

Redefining Builtin Commands

Beyond the names themselves, Bash will happily let you redefine any function name, including builtin commands. Watch me redefine echo:

function echo { echo "$@"; }

Oops don’t do that, it will recurse until the process is killed. But if you do successfully redefine a builtin command, you can call the original with command. So this is fine:

function echo { command echo "$@"; }

Unless you also redefine the command command, then you’re really screwed. For what it’s worth, function does not seem to be redefine-able. And I was disappointed relieved to find that redefining exit does not prevent shell processes from quitting, unless it is explicitly called.

What is this useful for?

My guess is Bash behaves this way to preserve backwards compatibility, so one benefit is that Bash will still run old shell code with weird function names. And obviously it makes code obfuscation a lot easier, which is fun until you have to debug it!

As Bash only has global and functional scopes, imported code can redefine previously declared symbols, such as function names. So if you’re writing a library or modulino, you can prepend every function name with $namespace. and then anything which sources it won’t have its own symbols clobbered by the import (assuming they follow the same convention).

For example here’s an error-reporting function from jp:

function jp.error {
  echo "Error: $1 at line $JP_LINE, column $JP_COL" >&2
  return 1
}

Because variable naming rules are stricter, I prepend jp’s global variables with JP_, but the idea is the sameĀ².

Is this a good idea? Well if your code is going to be Bash-only, then it’s fine. But it’s not portableĀ³. In particular, ash will not run this code. That’s bad news if you want to run it on Busybox-based Docker containers.

References

  1. See man 7 ascii for a handy reference table
  2. I copied this convention from JSON.bash
  3. See POSIX

Tags: bash programming-languages shell ash posix busybox