Day 6: More Profiling

« Day 5: Profiling Day 7: Json Offsets and Go-fuzzing »

📖 Mood: Excited for new library books

🎵 Soundtrack: Sour. On repeat. All day.

📈 Profiling

I skipped reading the new github issues (in the morning at least) and jumped straight back into the profiling I was doing yesterday with the Longest Collatz Sequence. (After doing the mandatory 30 minutes of fighting with expense reports, of course.)

The first thing I realized is that the problem was asking for the longest collatz sequence under 1 million (1_000_000) and I was running (unsuccessful) tests against 1 trillion (1_000_000_000). Oops! Counting zeros is hard! No wonder I was having problems optimizing the algorithm enough.

After I lowered the magnitude of difficulty by 1 million my program started running much more successfully.

I ended up writing three algorithms to compare:

No Parallel - (Slightly misleading because there is a little parallelization in order to make the code nice and DRY with the other algorithms.) I started one goroutine that does one big for loop and calculates the collatz sequence for each number sequentially. It sends the results to another goroutine that keeps track of the longest.
Max Parallel - I started one goroutine per number. So for checking the length of all collatz sequences under 1 million, I started a 1 million goroutines. These goroutines send results to another goroutine that keeps track of the longest.
Batch - This breaks up the numbers to calculate into a specified batch size. Then it starts a goroutine for each batch of numbers. These goroutines send results to another goroutine that keeps track of the longest.

I did no optimization around storing answers because I wanted to focus on only the parallism in the tracing.

Running with max == 1_000_000

	Real Time Without Profiling
No Parallel	0m0.730s
Max Parallel	0m0.960s
Batched (batch size 1000)	0m0.498s

Running with max == 10_000_000

	Real Time Without Profiling	Real Time With Profiling
No Parallel	0m9.796s	0m34.616s
Max Parallel	0m17.483s	1m9.528s
Batched (batch size 1000)	0m7.286s	0m35.698s

I tried to open the trace for these, but it was taking WAAAAY to long to load. So I ended up running it with max == 100_000 so the trace was nice and small and quick to load.

What I observed:

max parallel was always the worse algorithm
no parallel was actually a fine algorithm, the parallel optimizations did not make much difference at this level of computation
in the trace I saw that…
- the no parallel had chunks of empty (“wasted”) time for each thread
- the max parallel showed goroutines taking the longest. I think this is because they were CPU starved.
- the batched had the most uniform goroutine lifespan. I think this is because they were not CPU starved.

🔥💻🔥 my laptop will be happy to know that I am done trying to figure out the longest collatz sequence that starts with a number under 1 trillion.

🐞 Applying Profiling to Gorouter

So this all started because I read this issue yesterday about a common bug where someone might leak memory when using time.Tick in a select statement. Yesterday I found a place in gorouter (one of the things I maintain) where we do use time.Tick in a select statement.

Gorouter Code

		timer := time.Tick(RefreshInterval)
		for {
			select {
			case <-timer:
				p.Update()
			case <-p.stopSignal:
				return
			}
		}

From looking at this I can tell that we make one timer that is used over and over. Thus I can safely assume that we are not leaking extra timers all over the place. But I wanted to prove it with profiling! And I did!

I wrote the following program and got a trace for the good and bad functions:

package main

import (
	"time"
	"github.com/pkg/profile"
)

func main() {
	defer profile.Start(profile.TraceProfile, profile.ProfilePath(".")).Stop()
	good()
}

func bad() {
	fooCh := make(chan bool)
	count := 0
	max := 10000

	for {
		select {
		case <-fooCh:
			// does not matter
		case <-time.Tick(time.Millisecond): // leaking tickers
			count++
			if count == max {
				return
			}
		}
	}
}

func good() {
	fooCh := make(chan bool)
	count := 0
	max := 10000

	timer := time.Tick(time.Millisecond)

	for {
		select {
		case <-fooCh:
			// does not matter
		case <-timer:
			count++
			if count == max {
				return
			}
		}
	}
}

I loaded the traces and saw graphs that looked something like this:

 │ Good Graph
 │
H│
e│
a│
p│
 │
 │
 │--------------------------------
 └────────────────────────────────
  Time


                             ---
 │ Bad Graph              ---
 │                     ---
H│                  ---
e│               ---
a│            ---
p│         ---
 │      ---
 │   ---
 │---
 └────────────────────────────────
  Time

🎉 Yay! I was able to prove with profiling that the bad code was leaking memory and that the good code was not! The gorouter code is safe.

I searched my whole computer for case <-time.Tick and I did find it in one place. Golang! In an older version this test has a small memory leak! I was excited to fix it, but the code has since been refactored to no longer use time.Tick. Darn.

📚 Reading Github Issues

I ended my day by reading the 26 new github issues since I last checked yesterday morning.

Trends

Lots of fuzzing issues, including one about improving docs: 1, 2. It really seems like there is a lot of development and changes happening around this, and I think I am going to pick up my fuzzing explorations again tomorrow. Plus! Improving docs! I’m great at that!
There continues to be issues around generics, another area with a lot of development: 1.
Go workspace mode! Another proposal I hadn’t heard of: 1, 2, 3.

« Day 5: Profiling Day 7: Json Offsets and Go-fuzzing »