Deep Dive into Goroutines, Channels, and Runtime Internals in Go

This week at the office, I had a pretty normal day, but during a chat with one of my friends, they asked me some interesting questions about Go specifically, how Go’s locks, mutexes, and channels actually work under the hood. They were curious about how channels guarantee exactly-once delivery and whether Go’s mutexes use OS thread locks or something else entirely.

We ended up having a long, deep conversation about these topics. That got me curious too, so I decided to dive into the Go runtime internals, read up on how these things work, and write this blog post to share what I learned.

The GMP Model: Goroutines, M (Threads), and P (Processors)

At the core of Go's runtime is the GMP model:

G (Goroutine): A lightweight, user-space thread managed by Go's scheduler.
M (Machine): Represents an OS thread executing Go code.
P (Processor): A scheduler context. It holds a run queue of goroutines and is required for an M to run Go code.

Visual: GMP Model

   +----------+      +--------+      +----------+
   |          |      |        |      |          |
   |  Goroutine (G)  |  -->   |   P (Processor)  |  -->  OS Thread (M)
   |          |      |        |      |          |
   +----------+      +--------+      +----------+

Each P has a local queue of goroutines. When an M is bound to a P, it executes those goroutines. If a P runs out of work, it may steal work from other Ps or the global run queue.

Creating and Scheduling Goroutines

When you do:

go doWork()

A new G structure is created.
It is placed in the local run queue of the P currently executing.
If the local run queue is full, it may go to the global queue.
The M bound to the P will eventually pick it up and run it.

Go scheduler is cooperative, not preemptive (though it has cooperative preemption).

Channels and Sudog: Behind the Curtain

Code Example

ch := make(chan int)
go func() {
    ch <- 42 // sender
}()

val := <-ch // receiver

How it Works Internally

When you create a channel in Go, you're creating a value backed by an internal hchan struct. This struct represents the runtime's implementation of a channel.

Internal Representation: `hchan`

Here's a simplified version of what the hchan struct looks like in the Go runtime:

type hchan struct {
    qcount   uint      // number of elements in queue
    dataqsiz uint      // size of the circular queue (buffered channel)
    buf      unsafe.Pointer // pointer to the actual data buffer
    sendx    uint      // send index (for buffered channels)
    recvx    uint      // receive index (for buffered channels)
    recvq    waitq     // list of recv waiters (linked sudogs)
    sendq    waitq     // list of send waiters (linked sudogs)
    lock     mutex     // protects all fields in hchan
}

sendq and recvq are queues of goroutines waiting to send/receive.
For unbuffered channels, data is transferred directly between sender and receiver via a sudog.
For buffered channels, the data is enqueued/dequeued in buf, and indexes are updated.

Sudog Recap

A sudog is used to track goroutines blocked on channels (as well as locks, conditions, or file descriptors):

type sudog struct {
    g    *g           // the goroutine
    next *sudog       // next in the wait queue
    elem unsafe.Pointer // data element to pass through
}

Example: Unbuffered Channel Flow

Let's say ch := make(chan int) is unbuffered:

Sender goes first (ch <- 42):
- Runtime checks recvq. No receiver? Sender is parked.
- A sudog is created and added to sendq.
Receiver arrives (<-ch):
- Finds a matching sender in sendq.
- Performs direct data transfer using sudog.elem.
- Both sender and receiver are resumed.

Visual: Unbuffered Channel Flow

Sender G ----(no receiver)----> [Parked using sudog in sendq]
                                     |
Receiver G appears                   V
                     [sudog matched] -> [Data copied] -> [Sender + Receiver unparked]

Parking and Yielding

Parking

Goroutines are parked when:

They are blocked on a channel.
They are waiting for a mutex.
They are waiting on a syscall.

A parked goroutine is removed from the run queue and will only be resumed by the event it's waiting on.

Yielding

runtime.Gosched() causes the current goroutine to yield voluntarily:

It is put back at the end of the local run queue.
Another runnable goroutine from the queue is picked.

Cooperative Preemption

To prevent goroutines from running too long and blocking others:

Go uses cooperative preemption.
At safe points (function prologues, loop backedges, channel ops), the goroutine checks a flag.
If it’s asked to yield (due to runtime preempting it), it parks itself and goes back to the run queue.

What Happens During Syscalls or Network I/O?

When a goroutine performs a blocking syscall (e.g., read on a network socket):

The goroutine (G) becomes blocked.
The M executing the syscall is detached from its P.
A new M may be created or reused to continue using the P.
The file descriptor is registered with the netpoller.
When the socket is ready, the goroutine is unparked and becomes runnable.

Visual: Syscall + Netpoll

Goroutine --(net I/O syscall)--> Blocked
                               |
                            [Netpoller] <-- epoll/kqueue loop
                               |
                              Ready?
                               |
                            [Unpark G]

What is the Netpoller?

The netpoller is a goroutine-independent system that waits for file descriptors (network sockets, etc.) to become ready. It's implemented using OS primitives:

epoll (Linux)
kqueue (macOS, BSD)

It runs in a background thread and loops over registered FDs.

Netpoller Flow

for {
    readyFDs := epoll_wait() // or kqueue
    for fd in readyFDs {
        wake up G waiting on fd
    }
}

The runtime keeps track of which G is blocked on which fd using a pollDesc and associates it with a sudog. When an event occurs, the goroutine is marked runnable.

What About Mutexes?

Go's sync.Mutex is optimized for goroutines:

Code Example

var mu sync.Mutex

func worker(id int) {
    mu.Lock()
    fmt.Printf("Worker %d running\n", id)
    time.Sleep(100 * time.Millisecond)
    mu.Unlock()
}

func main() {
    for i := 0; i < 5; i++ {
        go worker(i)
    }
    time.Sleep(time.Second)
}

Internal Behavior

Fast Path:
- The mutex uses atomic operations (CAS) to try to acquire the lock.
- If successful, the goroutine continues — no OS involvement.
Slow Path:
- If the lock is already held, the goroutine creates a sudog and is parked.
- It's added to the mutex's wait queue.
- When Unlock is called, the next waiting goroutine is dequeued and resumed.

Visual: Mutex Lock Flow

Try Lock (CAS) --> Success? Yes -> Enter Critical Section
                 |
                 No
                 |
              [Park G with sudog] -- wait until Unlock

How Starvation is Prevented

The Go runtime employs several techniques to prevent starvation across its queues:

FIFO Queues with Fairness:
- Mutex wait queues, channel wait queues, and even the run queues are ordered so goroutines are resumed in the order they arrived.
- This helps guarantee fairness for locks and channels.
Work Stealing:
- If a P runs out of runnable goroutines, it will steal from others.
- Prevents long idle times and ensures all Ps stay busy, helping balance load.
Netpoll Integration with Scheduler:
- Network-waiting goroutines are placed in a separate poller queue.
- When their I/O is ready, they are pushed back into the run queues fairly.
Spinning and Backoff for Mutexes:
- A goroutine will spin a bit before parking, which helps short-lived locks complete quickly.
- Prevents over-parking and keeps the mutex throughput high.
Global Run Queue and Preemption:
- If local queues are too full or empty, goroutines flow into the global run queue.
- Cooperative preemption allows long-running goroutines to be paused, giving others a chance to run.

Summary: Lifecycle of a Goroutine in Go

Event	Runtime Action
`go func()`	G is created and added to run queue
Channel send/recv	G may park, data transfer via sudog
Syscall or net I/O	G is parked, M detached, fd polled
Mutex lock	G may spin or park using sema/sudog
Unblocked	G added back to local or global run queue

Final Thoughts

Go's concurrency model is elegant on the outside but deeply optimized inside. From sudog structures to epoll-based netpollers, cooperative preemption to lock-free scheduling, understanding these internals helps write more efficient and predictable concurrent code.

The GMP Model: Goroutines, M (Threads), and P (Processors)

Visual: GMP Model

Creating and Scheduling Goroutines

Channels and Sudog: Behind the Curtain

Code Example

How it Works Internally

Internal Representation: hchan

Sudog Recap

Example: Unbuffered Channel Flow

Visual: Unbuffered Channel Flow

Parking and Yielding

Parking

Yielding

Cooperative Preemption

What Happens During Syscalls or Network I/O?

Visual: Syscall + Netpoll

What is the Netpoller?

Netpoller Flow

What About Mutexes?

Code Example

Internal Behavior

Visual: Mutex Lock Flow

How Starvation is Prevented

Summary: Lifecycle of a Goroutine in Go

Final Thoughts

Internal Representation: `hchan`