Day 6: More Profiling
π Mood: Excited for new library books
π΅ Soundtrack: Sour. On repeat. All day.
π Profiling
I skipped reading the new github issues (in the morning at least) and jumped straight back into the profiling I was doing yesterday with the Longest Collatz Sequence. (After doing the mandatory 30 minutes of fighting with expense reports, of course.)
The first thing I realized is that the problem was asking for the longest collatz sequence under 1 million (1_000_000) and I was running (unsuccessful) tests against 1 trillion (1_000_000_000). Oops! Counting zeros is hard! No wonder I was having problems optimizing the algorithm enough.
After I lowered the magnitude of difficulty by 1 million my program started running much more successfully.
I ended up writing three algorithms to compare:
- No Parallel - (Slightly misleading because there is a little parallelization in order to make the code nice and DRY with the other algorithms.) I started one goroutine that does one big for loop and calculates the collatz sequence for each number sequentially. It sends the results to another goroutine that keeps track of the longest.
- Max Parallel - I started one goroutine per number. So for checking the length of all collatz sequences under 1 million, I started a 1 million goroutines. These goroutines send results to another goroutine that keeps track of the longest.
- Batch - This breaks up the numbers to calculate into a specified batch size. Then it starts a goroutine for each batch of numbers. These goroutines send results to another goroutine that keeps track of the longest.
I did no optimization around storing answers because I wanted to focus on only the parallism in the tracing.
Running with max == 1_000_000
Β | Real Time Without Profiling |
---|---|
No Parallel | 0m0.730s |
Max Parallel | 0m0.960s |
Batched (batch size 1000) | 0m0.498s |
Running with max == 10_000_000
Β | Real Time Without Profiling | Real Time With Profiling |
---|---|---|
No Parallel | 0m9.796s | 0m34.616s |
Max Parallel | 0m17.483s | 1m9.528s |
Batched (batch size 1000) | 0m7.286s | 0m35.698s |
I tried to open the trace for these, but it was taking WAAAAY to long to load. So I ended up running it with max == 100_000 so the trace was nice and small and quick to load.
What I observed:
- max parallel was always the worse algorithm
- no parallel was actually a fine algorithm, the parallel optimizations did not make much difference at this level of computation
- in the trace I saw thatβ¦
- the no parallel had chunks of empty (βwastedβ) time for each thread
- the max parallel showed goroutines taking the longest. I think this is because they were CPU starved.
- the batched had the most uniform goroutine lifespan. I think this is because they were not CPU starved.
π₯π»π₯ my laptop will be happy to know that I am done trying to figure out the longest collatz sequence that starts with a number under 1 trillion.
π Applying Profiling to Gorouter
So this all started because I read this
issue yesterday about a common bug
where someone might leak memory when using time.Tick
in a select statement.
Yesterday I found a place in
gorouter
(one of the things I maintain) where we do use time.Tick
in a select
statement.
Gorouter Code
timer := time.Tick(RefreshInterval)
for {
select {
case <-timer:
p.Update()
case <-p.stopSignal:
return
}
}
From looking at this I can tell that we make one timer that is used over and over. Thus I can safely assume that we are not leaking extra timers all over the place. But I wanted to prove it with profiling! And I did!
I wrote the following program and got a trace for the good and bad functions:
package main
import (
"time"
"github.com/pkg/profile"
)
func main() {
defer profile.Start(profile.TraceProfile, profile.ProfilePath(".")).Stop()
good()
}
func bad() {
fooCh := make(chan bool)
count := 0
max := 10000
for {
select {
case <-fooCh:
// does not matter
case <-time.Tick(time.Millisecond): // leaking tickers
count++
if count == max {
return
}
}
}
}
func good() {
fooCh := make(chan bool)
count := 0
max := 10000
timer := time.Tick(time.Millisecond)
for {
select {
case <-fooCh:
// does not matter
case <-timer:
count++
if count == max {
return
}
}
}
}
I loaded the traces and saw graphs that looked something like this:
β Good Graph
β
Hβ
eβ
aβ
pβ
β
β
β--------------------------------
βββββββββββββββββββββββββββββββββ
Time
---
β Bad Graph ---
β ---
Hβ ---
eβ ---
aβ ---
pβ ---
β ---
β ---
β---
βββββββββββββββββββββββββββββββββ
Time
π Yay! I was able to prove with profiling that the bad code was leaking memory and that the good code was not! The gorouter code is safe.
I searched my whole computer for case <-time.Tick
and I did find it in one
place. Golang! In an older version this
test
has a small memory leak! I was excited to fix it, but the code has since been
refactored to no longer use time.Tick
. Darn.
π Reading Github Issues
I ended my day by reading the 26 new github issues since I last checked yesterday morning.
Trends
- Lots of fuzzing issues, including one about improving docs: 1, 2. It really seems like there is a lot of development and changes happening around this, and I think I am going to pick up my fuzzing explorations again tomorrow. Plus! Improving docs! Iβm great at that!
- There continues to be issues around generics, another area with a lot of development: 1.
- Go workspace mode! Another proposal I hadnβt heard of: 1, 2, 3.