Keeping documentation in sync with source code

Published by Oğuzhan Durgun on February 01, 2022
Keeping documentation in sync with source code

In this article, Cerbos engineer Oğuzhan Durgun (@oguzhand95) describes how Cerbos configuration documentation is automatically kept in sync with the Go source code.

Cerbos is a highly customizable application with support for many different storage engines that require specific configuration options, features that can be toggled on/off, and behaviour that can be fine-tuned (e.g. stricter/lenient schema validation, enabling/disabling JWT verification). At the time of writing, Cerbos has over 60 individual configuration parameters that can be set using the configuration file or command-line arguments. However, they all have reasonable defaults and Cerbos only requires a storage driver to be configured in order to start.

The Cerbos team is obsessed with providing good documentation. Features are not considered to be complete until documentation is written and it’s a requirement for merging any PRs that have user-facing changes. For a while, we kept the full configuration section of the documentation up-to-date manually whenever there was a code change. However, that became tedious and error-prone as the number of features and pull requests increased. Human reviewers were not always able to catch subtle changes that required updates to the documentation. Another difficult problem was checking for correct indentation visually (because the configuration is YAML-based and indentation matters). As Cerbos is an open-source project that anyone can contribute to, new contributors would not be aware of the requirement to update the documentation either. Therefore it was obvious that automating this process would significantly improve the development experience for both seasoned and new contributors while ensuring that Cerbos documentation is reliable and informative for our users.

Configuration is decentralized in the Cerbos code base to keep features compartmentalized and easily pluggable. Configuration sections are declared as structs implementing a special interface and are populated lazily at runtime. Since there is no single place where all configuration sections are aggregated, discovering all available options is a tricky problem. Without resorting to a central configuration registry or manual enumeration (with all the associated issues described above), the only reliable solution is to programmatically discover configuration sections by analyzing the Go source code of Cerbos.

Each configuration struct in the Cerbos code base implements the following minimal interface.

// internal/config/config.go
type Section interface {
  Key() string
}

Essentially, the Key method returns the configuration key for the struct. The structs have fields that correspond to the structure of the YAML configuration file. Structs can have other nested structs as fields. In some cases where there are pluggable implementations (e.g. storage) those nested structs are not even declared in the parent as fields. The value returned by the Key method can be a dot-separated value which determines where a section is nested. For example, the git configuration section’s Key method would return storage.git which indicates that it should be nested under storage in the config file.

We introduced a custom struct tag called conf to annotate the config fields within the struct with useful information such as optionality and examples of usage. In addition, the +desc marker was introduced to provide the description of the root configuration field. These hints, combined with source code comments attached to each field, can then be used in the generated documentation to explain what each field does and how it should be used.

A typical, annotated config struct looks like the following:

// Conf is required (if driver is set to 'disk') configuration for disk storage driver.
//+desc=This section is required only if storage.driver is disk.
type Conf struct {
  // Directory is the path on disk where policies are stored.
  Directory string `yaml:"directory" conf:"required,example=pkg/test/testdata/store"`
  // [DEPRECATED] ScratchDir is the directory to use for holding temporary data.
  ScratchDir string `yaml:"scratchDir" conf:",ignore"`
  // WatchForChanges enables watching the directory for changes.
  WatchForChanges bool `yaml:"watchForChanges" conf:"required,example=false"`
}

func (conf *Conf) Key() string {
  return "storage.disk"
}

Go has several excellent packages for parsing source code and traversing the Abstract Syntax Tree (AST). It also has the powerful go:generate directive that can be used to invoke code generation tools on Go source files. Our initial plan was to use these tools in the following way:

  • Annotate each Go file containing a config struct with a go:generate directive that invoked our documentation generator that we named confdocs.

  • When confdocs is invoked by go:generate, it would search the given source file for any structs that implement the Section interface described above.

  • Extract the set of fields and struct tags from each discovered struct and generate the documentation for those.

  • Write the generated documentation to a file in our documentation directory, mirroring the package structure of the struct.

  • Finally, merge all the generated documentation files together using ASCIIDoc include directives to produce the full configuration documentation page.

Using the x/tools/go/packages package, It is fairly easy to search for the interface by looking through the package that contains it.

for _, pkg := range pkgs {
  if obj := pkg.Types.Scope().Lookup("Section"); obj != nil {
     return obj.Type().Underlying().(*types.Interface), nil
  }
}

After finding the reference for the Section interface, we just had to iterate through all the structs in the source file and find the structs implementing that interface. Each found struct’s AST was traversed to extract struct tags, fields, and comments. The generated documentation was then written out to a file matching the package structure of the Go file we were analyzing. A shell script then combines all these files together by using the hierarchy information to produce a set of include directives in the correct order.

This approach worked and we were able to produce the documentation in the format we wanted. However, there were several issues:

  • The directory structure of the packages containing the config structs did not reflect the actual organization of those sections in the config file in some cases. We had to add manual overrides to handle those cases but it was obvious that it was a brittle solution that was hard to maintain.

  • The section name for each struct was “guessed” as the name of the package that contained it. While this was the convention, there is no way to enforce that and that could cause problems in the future.

  • Developers had to add go generate directives to each file that contained configuration structs. It is possible to forget to add the required go generate directives to the necessary files and this could easily go unnoticed during the code reviews.

  • The implementation was slow because for each go generate directive the tool executes once and for each execution the tool had to initialize the parser, find the Section interface, index the structs, and create the docs for the related configuration struct.

Even though this was a helper utility that is not part of the product; performance, reliability and maintainability are key engineering principles observed by the Cerbos team. We decided to apply the same diligence we apply to the core product to make this tool better.

Performance could be improved drastically by only running the confdocs tool once over the whole codebase rather than on individual packages. This way, the tool has to initialize the parser, find the Section interface and do the indexing only once. It would also address the concern we had with go:generate directives. Developers would no longer need to remember to add a directive to the source code whenever they introduce a new config section – which would reduce the room for mistakes.

Instead of relying on implicit conventions like the package structure and package names to determine the layout of the config file, we decided to use the canonical source of truth – which is the value of the Key method implemented by each config struct. However, this posed a problem because we had to call the method to obtain its value. Currently, it is not possible in Go to combine the AST with reflection to create an object from an AST representation and call any methods on it. While we could rely on another implicit convention and traverse the AST to find the constant value that was returned by each of these Key methods, that seemed a brittle assumption to make. It was also further complicated by the fact that sometimes the constants were created by combining several constants together.

To get around the problem of not being able to call the Key method from the AST, we employed a clever hack. As we traversed the AST, we generated Go source code that imports the correct package, creates a new instance of the struct and calls the Key method on it to obtain the right value. The generated code also includes logic to correctly group and render the documentation generated by the AST traversal. The documentation generation process is now a two step process:

  1. Parse the Cerbos code base, discover the config structs and generate a Go file that contains the rendering logic

  2. Run the generated Go file to produce the final documentation file

Now that the improved confdocs tool is integrated into our CI workflow, any time a developer modifies a config structure or introduces a new one, the documentation is automatically updated to reflect those changes without requiring any extra work by the developer. Because documentation is derived from inline comments and struct tags, reviewers can easily catch any inconsistencies and flag them during code reviews as well. When the pull request is merged, the generated documentation is automatically published to the Cerbos documentation site and available to users at https://docs.cerbos.dev/cerbos/latest/configuration/index.html#_full_configuration.

The source code for the tool is available in the Cerbos GitHub repo. This was a fun little project that had many unexpected problems that required a lot of creative solutions to be found. It has increased our developer productivity and helps us keep the all important documentation always up-to-date for our users.

ENGINEERING

Book a free Policy Workshop to discuss your requirements and get your first policy written by the Cerbos team