Code

Is Golang's Grammar Context Free?

Perhaps it comes down to semantics

February 27, 2022

Most programming language grammars are not context-free. That means the parser has to maintain state in order to process a snippet of the language.

Often when discussing context somebody will point out the parsing strings, parentheses and so on also require a kind of state in order to match their terminators. That is not the same thing. The question here is given valid input, can a parser using the language grammar alone, assemble it into an unambiguous parse tree or not.

Perl is notorious for this, take this snippet:¹

foo  / 25 ; # / ; die;

If foo has been forward declared with a function signature that takes an argument, / 25 ; # / is matched against $_ and the result sent to foo. Then the code dies. Else the result of foo() is divided in a void context and the rest of the line is interpreted as a comment.

Chances are your favorite language suffers from the same issue; even a grammar as simple as C’s requires context to resolve ambiguities.²

Parsing Go

Go however, is different. Parsing it requires very little context. The Go FAQ claims:

the language has been designed to be easy to analyze and can be parsed without a symbol table

But Go’s spec uses the same syntax for function calls and type conversions:

y := Foo(x)

The only way to know is by checking the symbol table to see if Foo is a type or a function. So is the FAQ wrong?

To check, I fed the go parser a test program:

package main

import (
	"go/parser"
	"go/token"
)

func main() {
	src := `
package main

type Bar int

func main () {
	Bar(5)
}
`
	parser.ParseFile(token.NewFileSet(), "", src, parser.Trace)
}

This program parses a string of go code containing the type conversion Bar(5). The parser’s trace mode makes it print its parse tree. If it classifies Bar(5) as a type conversion, it must have used the symbol table to resolve the ambiguity!

Here’s the relevant output:

7:  2: . . . . . . . . . . UnaryExpr (
7:  2: . . . . . . . . . . . PrimaryExpr (
7:  2: . . . . . . . . . . . . Operand (
7:  2: . . . . . . . . . . . . . IDENT Bar
7:  5: . . . . . . . . . . . . )
7:  5: . . . . . . . . . . . . CallOrConversion (
7:  5: . . . . . . . . . . . . . "("
7:  6: . . . . . . . . . . . . . Expression (
7:  6: . . . . . . . . . . . . . . BinaryExpr (
7:  6: . . . . . . . . . . . . . . . UnaryExpr (
7:  6: . . . . . . . . . . . . . . . . PrimaryExpr (
7:  6: . . . . . . . . . . . . . . . . . Operand (
7:  6: . . . . . . . . . . . . . . . . . . INT 5

Note the CallOrConversion node it found on line 7! So the Go FAQ is correct; the parser did not use context and resolve whether it was a function call or type conversion. As they have the same precedence, it didn’t need to. By deferring the resolution until later, Go can be parsed faster. On the other hand it is not quite context-free.

Notes

  1. I’ve shortened this code from an example by Randal Schwartz. This PerlMonks post discusses it and its implications.
  2. Eli Bendersky has a lucid blog post about C’s ambiguities which clarified my understanding.

Tags: context-free-grammar parser go perl c