How I investigated memory leaks in Go using pprof on a large codebase


I have been working with Go for the better part of the year, implementing a scalable blockchain infrastructure at Orbs, and it’s been an exciting year. Over the course of 2018, we researched on which language to choose for our blockchain implementation. This led us to choose Go because of our understanding that it has a good community and an amazing tool-set.

In recent weeks we are entering the final stages of integration of our system. As in any large system, the later stage problems which include performance issues, in specific memory leaks, may occur. As we were integrating the system, we realized we found one. In this article I will touch the specifics of how to investigate a memory leak in Go, detailing the steps taken to find, understand and resolve it.

The tool-set offered by Golang is exceptional but has its limitations. Touching these first, the biggest one is the limited ability to investigate full core dumps. A full core dump would be the image of the memory (or user-memory) taken by the process running the program.

We can imagine the memory mapping as a tree, and traversing that tree would take us through the different allocations of objects and the relations. This means that whatever is at the root is the reason for ‘holding’ the memory and not GCing it (Garbage Collecting). Since in Go there is no simple way to analyze the full core dump, getting to the roots of an object that does not get GC-ed is difficult.

At the time of writing, we were unable to find any tool online that can assist us with that. Since there exists a core dump format and a simple enough way to export it from the debug package, it could be that there is one used at Google. Searching online it looks like it is in the Golang pipeline, creating a core dump viewer of such, but doesn’t look like anyone is working on it. Having said that, even without access to such a solution, with the existing tools we can usually get to the root cause.

Memory Leaks

Memory leaks, or memory pressure, can come in many forms throughout the system. Usually we address them as bugs, but sometimes their root cause may be in design decisions.

As we build our system under emerging design principles, such considerations are not believed to be of importance and that is okay. It is more important to build the system in a way that avoids premature optimizations and enables you to perform them later on as the code matures, rather than over engineer it from the get-go. Still, some common examples of seeing memory pressure issues materialize are:

  • Too many allocations, incorrect data representation
  • Heavy usage of reflection or strings
  • Using globals
  • Orphaned, never-ending goroutines

In Go, the simplest way to create a memory leak is defining a global variable, array, and appending data to that array. This great blog post describes that case in a good way.

So why am I writing this post? When I was researching into this case I found many resources about memory leaks. Yet, in reality systems have more than 50 lines of code and a single struct. In such cases, finding the source of a memory issue is much more complex than what that example describes.

Golang gives us an amazing tool called pprof. This tool, when mastered, can assist in investigating and most likely finding any memory issue. Another purpose it has is for investigating CPU issues, but I will not go into anything related to CPU in this post.

go tool pprof

Covering everything that this tool does will require more than one blog post. One thing that took a while is finding out how to use this tool to get something actionable. I will concentrate this post on the memory related feature of it.

The pprof package creates a heap sampled dump file, which you can later analyze / visualize to give you a map of both:

  • Current memory allocations
  • Total (cumulative) memory allocations

The tool has the ability to compare snapshots. This can enable you to compare a time diff display of what happened right now and 30 seconds ago, for example. For stress scenarios this can be useful to assist in locating problematic areas of your code.

pprof profiles

The way pprof works is using profiles.

A Profile is a collection of stack traces showing the call sequences that led to instances of a particular event, such as allocation.

The file runtime/pprof/pprof.go contains the detailed information and implementation of the profiles.

Go has several built in profiles for us to use in common cases:

  • goroutine — stack traces of all current goroutines
  • heap — a sampling of memory allocations of live objects
  • allocs — a sampling of all past memory allocations
  • threadcreate — stack traces that led to the creation of new OS threads
  • block — stack traces that led to blocking on synchronization primitives
  • mutex — stack traces of holders of contended mutexes

When looking at memory issues, we will concentrate on the heap profile. The allocs profile is identical in regards of the data collection it does. The difference between the two is the way the pprof tool reads there at start time. Allocs profile will start pprof in a mode which displays the total number of bytes allocated since the program began (including garbage-collected bytes). We will usually use that mode when trying to make our code more efficient.

The heap

In abstract, this is where the OS (Operating System) stores the memory of objects our code uses. This is the memory which later gets ‘garbage collected’, or freed manually in non-garbage collected languages.

The heap is not the only place where memory allocations happen, some memory is also allocated in the Stack. The Stack purpose is short term. In Go the stack is usually used for assignments which happen inside the closure of a function. Another place where Go uses the stack is when the compiler ‘knows’ how much memory needs to be reserved before run-time (e.g. fixed size arrays). There is a way to run the Go compiler so it will output an analysis of where allocations ‘escape’ the stack to the heap, but I will not touch that in this post.

While heap data needs to be ‘freed’ and gc-ed, stack data does not. This means it is much more efficient to use the stack where possible.

This is an abstract of the different locations where memory allocation happens. There is a lot more to it but this will be outside the scope for this post.

Obtaining heap data with pprof

There are two main ways of obtaining the data for this tool. The first will usually be part of a test or a branch and includes importing runtime/pprof and then calling pprof.WriteHeapProfile(some_file) to write the heap information.

Note that WriteHeapProfile is syntactic sugar for running:

// lookup takes a profile namepprof.Lookup("heap").WriteTo(some_file, 0)

According to the docs, WriteHeapProfile exists for backwards compatibility. The rest of the profiles do not have such shortcuts and you must use the Lookup() function to get their profile data.

The second, which is the more interesting one, is to enable it over HTTP (web based endpoints). This allows you to extract the data adhoc, from a running container in your e2e / test environment or even from ‘production’. This is one more place where the Go runtime and tool-set excels. The entire package documentation is found here, but the TL;DR is you will need to add it to your code as such:

import (  "net/http"  _ "net/http/pprof")
func main() {  ...  http.ListenAndServe("localhost:8080", nil)}

The ‘side effect’ of importing net/http/pprof is the registration the pprof endpoints under the web server root at /debug/pprof. Now using curl we can get the heap information files to investigate:

curl -sK -v http://localhost:8080/debug/pprof/heap > heap.out

Adding the http.ListenAndServe() above is only required i