This week at the office, I had a pretty normal day, but during a chat with one of my friends, they asked me some interesting questions about Go specifically, how Go’s locks, mutexes, and channels actually work under the hood. They were curious about how channels guarantee exactly-once delivery and whether Go’s mutexes use OS thread locks or something else entirely.
We ended up having a long, deep conversation about these topics. That got me curious too, so I decided to dive into the Go runtime internals, read up on how these things work, and write this blog post to share what I learned.
The GMP Model: Goroutines, M (Threads), and P (Processors)
At the core of Go's runtime is the GMP model:
- G (Goroutine): A lightweight, user-space thread managed by Go's scheduler.
- M (Machine): Represents an OS thread executing Go code.
- P (Processor): A scheduler context. It holds a run queue of goroutines and is required for an M to run Go code.
Visual: GMP Model
+----------+ +--------+ +----------+
| | | | | |
| Goroutine (G) | --> | P (Processor) | --> OS Thread (M)
| | | | | |
+----------+ +--------+ +----------+
Each P
has a local queue of goroutines. When an M
is bound to a P
, it executes those goroutines. If a P
runs out of work, it may steal work from other Ps or the global run queue.
Creating and Scheduling Goroutines
When you do:
go doWork()
- A new G structure is created.
- It is placed in the local run queue of the P currently executing.
- If the local run queue is full, it may go to the global queue.
- The M bound to the P will eventually pick it up and run it.
Go scheduler is cooperative, not preemptive (though it has cooperative preemption).
Channels and Sudog: Behind the Curtain
Code Example
ch := make(chan int)
go func() {
ch <- 42 // sender
}()
val := <-ch // receiver
How it Works Internally
When you create a channel in Go, you're creating a value backed by an internal hchan
struct. This struct represents the runtime's implementation of a channel.
Internal Representation: hchan
Here's a simplified version of what the hchan
struct looks like in the Go runtime:
type hchan struct {
qcount uint // number of elements in queue
dataqsiz uint // size of the circular queue (buffered channel)
buf unsafe.Pointer // pointer to the actual data buffer
sendx uint // send index (for buffered channels)
recvx uint // receive index (for buffered channels)
recvq waitq // list of recv waiters (linked sudogs)
sendq waitq // list of send waiters (linked sudogs)
lock mutex // protects all fields in hchan
}
sendq
andrecvq
are queues of goroutines waiting to send/receive.- For unbuffered channels, data is transferred directly between sender and receiver via a
sudog
. - For buffered channels, the data is enqueued/dequeued in
buf
, and indexes are updated.
Sudog Recap
A sudog
is used to track goroutines blocked on channels (as well as locks, conditions, or file descriptors):
type sudog struct {
g *g // the goroutine
next *sudog // next in the wait queue
elem unsafe.Pointer // data element to pass through
}
Example: Unbuffered Channel Flow
Let's say ch := make(chan int)
is unbuffered:
- Sender goes first (
ch <- 42
):- Runtime checks
recvq
. No receiver? Sender is parked. - A
sudog
is created and added tosendq
.
- Runtime checks
- Receiver arrives (
<-ch
):- Finds a matching sender in
sendq
. - Performs direct data transfer using
sudog.elem
. - Both sender and receiver are resumed.
- Finds a matching sender in
Visual: Unbuffered Channel Flow
Sender G ----(no receiver)----> [Parked using sudog in sendq]
|
Receiver G appears V
[sudog matched] -> [Data copied] -> [Sender + Receiver unparked]
Parking and Yielding
Parking
Goroutines are parked when:
- They are blocked on a channel.
- They are waiting for a mutex.
- They are waiting on a syscall.
A parked goroutine is removed from the run queue and will only be resumed by the event it's waiting on.
Yielding
runtime.Gosched()
causes the current goroutine to yield voluntarily:
- It is put back at the end of the local run queue.
- Another runnable goroutine from the queue is picked.
Cooperative Preemption
To prevent goroutines from running too long and blocking others:
- Go uses cooperative preemption.
- At safe points (function prologues, loop backedges, channel ops), the goroutine checks a flag.
- If it’s asked to yield (due to runtime preempting it), it parks itself and goes back to the run queue.
What Happens During Syscalls or Network I/O?
When a goroutine performs a blocking syscall (e.g., read
on a network socket):
- The goroutine (G) becomes blocked.
- The M executing the syscall is detached from its P.
- A new M may be created or reused to continue using the P.
- The file descriptor is registered with the netpoller.
- When the socket is ready, the goroutine is unparked and becomes runnable.
Visual: Syscall + Netpoll
Goroutine --(net I/O syscall)--> Blocked
|
[Netpoller] <-- epoll/kqueue loop
|
Ready?
|
[Unpark G]
What is the Netpoller?
The netpoller is a goroutine-independent system that waits for file descriptors (network sockets, etc.) to become ready. It's implemented using OS primitives:
- epoll (Linux)
- kqueue (macOS, BSD)
It runs in a background thread and loops over registered FDs.
Netpoller Flow
for {
readyFDs := epoll_wait() // or kqueue
for fd in readyFDs {
wake up G waiting on fd
}
}
The runtime keeps track of which G is blocked on which fd using a pollDesc
and associates it with a sudog
. When an event occurs, the goroutine is marked runnable.
What About Mutexes?
Go's sync.Mutex
is optimized for goroutines:
Code Example
var mu sync.Mutex
func worker(id int) {
mu.Lock()
fmt.Printf("Worker %d running\n", id)
time.Sleep(100 * time.Millisecond)
mu.Unlock()
}
func main() {
for i := 0; i < 5; i++ {
go worker(i)
}
time.Sleep(time.Second)
}
Internal Behavior
- Fast Path:
- The mutex uses atomic operations (
CAS
) to try to acquire the lock. - If successful, the goroutine continues — no OS involvement.
- The mutex uses atomic operations (
- Slow Path:
- If the lock is already held, the goroutine creates a
sudog
and is parked. - It's added to the mutex's wait queue.
- When
Unlock
is called, the next waiting goroutine is dequeued and resumed.
- If the lock is already held, the goroutine creates a
Visual: Mutex Lock Flow
Try Lock (CAS) --> Success? Yes -> Enter Critical Section
|
No
|
[Park G with sudog] -- wait until Unlock
How Starvation is Prevented
The Go runtime employs several techniques to prevent starvation across its queues:
- FIFO Queues with Fairness:
- Mutex wait queues, channel wait queues, and even the run queues are ordered so goroutines are resumed in the order they arrived.
- This helps guarantee fairness for locks and channels.
- Work Stealing:
- If a P runs out of runnable goroutines, it will steal from others.
- Prevents long idle times and ensures all Ps stay busy, helping balance load.
- Netpoll Integration with Scheduler:
- Network-waiting goroutines are placed in a separate poller queue.
- When their I/O is ready, they are pushed back into the run queues fairly.
- Spinning and Backoff for Mutexes:
- A goroutine will spin a bit before parking, which helps short-lived locks complete quickly.
- Prevents over-parking and keeps the mutex throughput high.
- Global Run Queue and Preemption:
- If local queues are too full or empty, goroutines flow into the global run queue.
- Cooperative preemption allows long-running goroutines to be paused, giving others a chance to run.
Summary: Lifecycle of a Goroutine in Go
Event | Runtime Action |
---|---|
go func() | G is created and added to run queue |
Channel send/recv | G may park, data transfer via sudog |
Syscall or net I/O | G is parked, M detached, fd polled |
Mutex lock | G may spin or park using sema/sudog |
Unblocked | G added back to local or global run queue |
Final Thoughts
Go's concurrency model is elegant on the outside but deeply optimized inside. From sudog
structures to epoll-based netpollers, cooperative preemption to lock-free scheduling, understanding these internals helps write more efficient and predictable concurrent code.