Static analysis tools

for Go code comprehension and refactoring

golang nyc meetup

13 Nov 2014

Alan Donovan

Google, New York

Video

This talk was presented at the GothamGo Kickoff Meetup in New York City, November 2014.

Introduction

Programmers say "writing code", but most of that time is spent reading

Actually: reading code, and making logical deductions

Machines are good at reading and logic

Machines should be helping us read code.

And write it! Refactoring can be tedious and error-prone.

Outline

Demo: the Go oracle

Supports interactive, editor-integrated queries:
- where is this thing defined?
- what are the methods of this type?
- what are the free variables of this selection?
- what might this expression point to?
- what dynamic types might this interface or reflect.Value contain?

Demo: godoc analysis features

Package view
- method set and implements relation for every type
- internal call graph of every package

Source code view
- kind, type, definition of every identifier
- method set and implements relation for every type
- peers of every channel send/receive
- callers of every function
- callees of every call site (even dynamic)

How it works

Libraries and tools

Mostly in golang.org/x/tools repo

go/types: the Go type checker

Computes mappings from:
- identifiers to definitions
- constant expressions to values
- expressions to types

Many subtle corners:
- method set computation
- recursive interfaces
- shifts

Making it correct, fast, and clean was a substantial project

Author: Robert Griesemer

go/ssa: intermediate representation (IR)

Typed abstract syntax trees (ASTs) are too varied and subtle to analyze directly

Programs are lowered into Static Single-Assignment form (SSA):
- simplifies dataflow analyses since reaching definitions are implicit
- invented 1991, now mainstream (gcc, llvm)

All Go programs can be expressed using only ~30 basic instructions

Simple, explicit, high-level, high source fidelity

The llgo project is using go/ssa as a front-end for LLVM

Demo: ssadump

go/pointer: pointer analysis

Pointers complicate reasoning about program behaviour
- functions may be called dynamically
- a variable may be updated and accessed via different names ("aliases")
- runtime types are values too: interface, reflect.Type

We use pointer analysis to answer the question:
which variables might this pointer point to?

Pointer analysis

Let pts(p) be the points-to set of pointer p
Its elements are abstract variables: globals, locals, unnamed (new(T))

Overview:

Statement      Constraint
 y = &x         pts(y) ⊇ {x}
 y = x          pts(y) ⊇ pts(x)                     "inclusion-based"
*y = x          ∀o ∈ pts(y). pts(o) ⊇ pts(x)
 y = *x         ∀o ∈ pts(x). pts(y) ⊇ pts(o)
 y = &x->f      \
 y = x.(T)       } not shown
 reflection     /

Pointer analysis: constraint generation

All Go operations can be expressed in these constraints
Function, map, slice and channel ops all simplify to struct ops

Treatment of unsafe.Pointer conversion is unsound
But we don't care! This isn't a compiler

The constraint system is:
- field-sensitive: struct fields x.f and x.g have distinct solutions
- flow-insensitive: the order of statements is ignored
- context-insensitive: each function is analyzed only once
- whole-program: Go source is required for all dependencies
- inclusion-based: i.e. Andersen's analysis

Optimization: don't make constraints for "uninteresting" types
such as types not related to a one-shot Oracle query

Pointer analysis: pre-solver optimizations

Constraint system for 124KLoC program (oracle) has:

254,000 variables
184,000 equations

Solving phase is in O(|v|³), so pre-solver optimisation is crucial

We can bring this down to:

88,000 variables
65,000 equations

Pointer Equivalence by Hash-Value Numbering (Hardekopf & Lin '07)

p = &x                  a = &x                         if ... {
q = p                   b = a                            p, q = &x, &y
r = q                   c = b                          } else {
s = r                   if ... { a = c }                 p, q = &y, &x
                                                       }

Hash-Value Numbering

Pointer analysis: solving

It's transitive closure of a graph, essentially

Lots of optimizations, for example:

sparse bit vectors, a very compact representation for points-to sets

Solver log is >1GB. Debugging is fun.

Call graph

The call graph can be read directly from the solution

The golang.org/x/tools/cmd/callgraph tool prints raw call graphs

Demo (time permitting)

Refactoring tools

gorename: precise, type-safe identifier renaming

A refactoring tool, usable from...

% gorename -from bytes.Buffer.Len -to Size
Renamed 105 occurrences in 42 files in 33 packages.

All renamings are reversible

Sound: either a conflict is reported, or the refactoring preserves behaviour*

*except reflection

Demo: gorename

Example-based refactoring with eg

A Go port of the Java Refaster tool

A template specifies the pattern and replacement as Go expressions:

package P

import ( "errors"; "fmt" )

func before(s string) error { return fmt.Errorf("%s", s) }
func after(s string)  error { return errors.New(s) }

Parameters are wildcards

% eg -t template.go <package> ...

Demo: eg

Conclusion

From the outset, Go has been renowned for its tools: go, gofmt

We have many building blocks for sophisticated source analysis tools

What should we build next?

演讲者

golang nyc meetup

13 Nov 2014

Alan Donovan

Google, New York