Channels
This chapter explains how libgodc implements Go channels for the Dreamcast. The implementation differs significantly from the standard Go runtime due to our M:1 cooperative scheduling model.
The hchan Structure
Every channel is an hchan structure allocated on the GC heap:
typedef struct hchan {
uint32_t qcount; // Items currently in buffer
uint32_t dataqsiz; // Buffer capacity (0 = unbuffered)
void *buf; // Ring buffer (follows hchan in memory)
uint16_t elemsize; // Size of each element
uint8_t closed; // Channel closed flag
uint8_t buf_mask_valid; // Power-of-2 optimization flag
struct __go_type_descriptor *elemtype;
uint32_t sendx; // Send index into ring buffer
uint32_t recvx; // Receive index into ring buffer
waitq recvq; // Goroutines waiting to receive
waitq sendq; // Goroutines waiting to send
uint8_t locked; // Simple lock (no contention in M:1)
} hchan;
When you write make(chan int, 3), libgodc allocates a single block containing both the hchan header and the buffer:
┌─────────────────────────────────────────────────────────────┐
│ │
│ Memory Layout for make(chan int, 3) │
│ │
│ ┌─────────────────────┬─────────────────────────────────┐ │
│ │ hchan (48B) │ buffer (3 × 4B = 12B) │ │
│ ├─────────────────────┼───────┬───────┬───────┬─────────┤ │
│ │ qcount, dataqsiz, │ [0] │ [1] │ [2] │ │ │
│ │ sendx, recvx, │ int │ int │ int │ │ │
│ │ waitqueues, ... │ │ │ │ │ │
│ └─────────────────────┴───────┴───────┴───────┴─────────┘ │
│ │
│ Total allocation: sizeof(hchan) + (cap × elemsize) │
│ │
└─────────────────────────────────────────────────────────────┘
Ring Buffer Indexing
The buffer is a circular queue. To find where to read/write:
static inline void *chanbuf(hchan *c, uint32_t i) {
uint32_t index = chan_index(c, i);
return (void *)((uintptr_t)c->buf + (uintptr_t)index * c->elemsize);
}
For power-of-2 capacities, we use bitwise AND instead of modulo:
static inline uint32_t chan_index(hchan *c, uint32_t i) {
if (c->buf_mask_valid)
return i & (c->dataqsiz - 1); // Fast: i & 3 for cap=4
return i % c->dataqsiz; // Slow: division
}
Tip: Use power-of-2 buffer sizes (2, 4, 8, 16…) for faster indexing.
The Send Algorithm
When you write ch <- value, this is chansend():
┌─────────────────────────────────────────────────────────────┐
│ chansend(c, elem, block) │
│ │
│ 1. nil channel? │
│ └── block=true: gopark forever (deadlock) │
│ └── block=false: return false │
│ │
│ 2. Channel closed? │
│ └── runtime_throw("send on closed channel") │
│ │
│ 3. Receiver waiting in recvq? │
│ └── YES: Copy data DIRECTLY to receiver's elem │
│ Wake receiver with goready() │
│ Return true │
│ │
│ 4. Buffer has space? (qcount < dataqsiz) │
│ └── YES: Copy to buf[sendx], increment sendx │
│ Return true │
│ │
│ 5. Non-blocking? (block=false) │
│ └── Return false │
│ │
│ 6. Must block: │
│ └── Create sudog, enqueue in sendq │
│ gopark() - yield to scheduler │
│ When woken: return success flag │
└─────────────────────────────────────────────────────────────┘
The key insight: direct transfer. If a receiver is already waiting, we copy data straight to their memory location, bypassing the buffer entirely. This is why unbuffered channels involve no buffer at all.
The Receive Algorithm
When you write value := <-ch, this is chanrecv():
┌─────────────────────────────────────────────────────────────┐
│ chanrecv(c, elem, block) │
│ │
│ 1. nil channel? │
│ └── block=true: gopark forever │
│ └── block=false: return false │
│ │
│ 2. Closed AND empty? │
│ └── Zero out elem, return (true, received=false) │
│ │
│ 3. Sender waiting in sendq? │
│ └── Unbuffered: Copy directly from sender's elem │
│ └── Buffered: Take from buffer, move sender's data in │
│ Wake sender with goready() │
│ Return (true, received=true) │
│ │
│ 4. Buffer has data? (qcount > 0) │
│ └── Copy from buf[recvx], zero slot, decrement qcount │
│ Return (true, received=true) │
│ │
│ 5. Non-blocking? │
│ └── Return false │
│ │
│ 6. Must block: │
│ └── Create sudog, enqueue in recvq │
│ gopark() │
│ When woken: return success │
└─────────────────────────────────────────────────────────────┘
The Buffered Receive with Waiting Sender
This case is subtle. When the buffer is full and a sender is waiting:
if (c->dataqsiz > 0) { // Buffered channel
// 1. Take oldest item from buffer for receiver
src = chanbuf(c, c->recvx);
chan_copy(c, elem, src);
// 2. Put sender's NEW item into the freed slot
chan_copy(c, src, sg->elem);
// 3. Advance indices (sendx follows recvx)
c->recvx = chan_index(c, c->recvx + 1);
c->sendx = c->recvx;
}
This maintains FIFO order: the receiver gets the oldest buffered value, not the sender’s new value.
Wait Queues and Sudogs
When a goroutine blocks on a channel, it creates a sudog (sender/receiver descriptor):
typedef struct sudog {
G *g; // The blocked goroutine
struct sudog *next; // Next in wait queue
struct sudog *prev; // Previous in wait queue
void *elem; // Pointer to data being sent/received
uint64_t ticket; // Used by select for case index
bool isSelect; // Part of a select statement?
bool success; // Did operation succeed?
struct sudog *waitlink; // For select: links all sudogs
struct sudog *releasetime; // Unused (Go runtime compat)
struct hchan *c; // Channel we're waiting on
} sudog;
The Sudog Pool
Creating sudogs during gameplay would trigger malloc(). libgodc pre-allocates a pool at startup:
void sudog_pool_init(void) {
for (int i = 0; i < 16; i++) {
sudog *s = (sudog *)malloc(sizeof(sudog));
s->next = global_pool;
global_pool = s;
}
}
acquireSudog() pulls from the pool; releaseSudog() returns to it. If the pool is exhausted, we fall back to malloc().
Wait Queues
Each channel has two wait queues (doubly-linked lists):
typedef struct waitq {
struct sudog *first;
struct sudog *last;
} waitq;
Operations:
waitq_enqueue()- add blocked goroutine to endwaitq_dequeue()- remove and return first goroutinewaitq_remove()- remove specific sudog (for select cancellation)
Blocking and Waking: gopark/goready
This is where libgodc’s M:1 model shines.
gopark() - Block Current Goroutine
void gopark(bool (*unlockf)(void *), void *lock, WaitReason reason) {
G *gp = getg();
if (!gp || gp == g0)
runtime_throw("gopark on g0 or nil");
gp->atomicstatus = Gwaiting;
gp->waitreason = reason;
// Call unlock function - if it returns false, abort parking
if (unlockf && !unlockf(lock)) {
gp->atomicstatus = Grunnable;
runq_put(gp);
return;
}
// Context switch to scheduler
__go_swapcontext(&gp->context, &sched_context);
}
The goroutine saves its context and swaps to the scheduler. The unlockf callback releases the channel lock atomically with parking - if it returns false, we abort and re-enqueue instead.
goready() - Wake a Goroutine
void goready(G *gp) {
if (!gp) return;
// Don't wake dead/already-runnable/running goroutines
Gstatus status = gp->atomicstatus;
if (status == Gdead || status == Grunnable || status == Grunning)
return;
gp->atomicstatus = Grunnable;
gp->waitreason = waitReasonZero;
runq_put(gp);
}
The woken goroutine becomes runnable and will be scheduled on the next schedule() call.
Why M:1 Simplifies Things
In standard Go, channels need atomic operations and memory barriers because multiple OS threads access them. libgodc runs all goroutines on one KOS thread:
- No atomics needed for
lockedflag (simple bool) - No memory barriers
- No contention on wait queues
- Context switches are explicit (cooperative)
The chan_lock()/chan_unlock() functions just set a flag:
void chan_lock(hchan *c) {
if (!c)
runtime_throw("chan: nil channel");
if (c->locked)
runtime_throw("chan: recursive lock");
c->locked = 1;
}
void chan_unlock(hchan *c) {
if (c) c->locked = 0;
}
This is safe because we never preempt a goroutine in the middle of a channel operation.
Select Implementation
Select is the most complex part. Here’s how selectgo() works:
Phase 1: Setup
SelectGoResult selectgo(scase *cas0, uint16_t *order0,
int nsends, int nrecvs, bool block) {
int ncases = nsends + nrecvs;
// order0 provides space for two arrays:
uint16_t *pollorder = order0; // Random order to check cases
uint16_t *lockorder = order0 + ncases; // Order to lock channels
Phase 2: Randomize Poll Order (Fairness)
// Fisher-Yates shuffle
for (int i = ncases - 1; i > 0; i--) {
int j = fastrand() % (i + 1);
uint16_t tmp = pollorder[i];
pollorder[i] = pollorder[j];
pollorder[j] = tmp;
}
Why random? If we always checked cases in order, the first case would always win when multiple are ready. Randomization ensures fairness.
Phase 3: Lock Channels (Deadlock Prevention)
// Sort by channel address using heap sort
heapsort_lockorder(cas0, lockorder, ncases);
// Lock in address order
sellock(cas0, lockorder, ncases);
If goroutine A does select { case <-ch1: case <-ch2: } and goroutine B does select { case <-ch2: case <-ch1: }, they could deadlock if they lock in different orders. Sorting by address ensures everyone locks in the same global order.
Phase 4: Check for Ready Cases
for (int i = 0; i < ncases; i++) {
int casi = pollorder[i]; // Check in random order
scase *cas = &cas0[casi];
hchan *c = cas->c;
if (c == NULL)
continue;
if (casi < nsends) {
// Send: closed channel will panic - select it
if (c->closed) {
selected = casi;
break;
}
// Check for waiting receiver or buffer space
if (!waitq_empty(&c->recvq) || c->qcount < c->dataqsiz) {
selected = casi;
break;
}
} else {
// Receive: check for waiting sender, buffer data, or closed
if (!waitq_empty(&c->sendq) || c->qcount > 0 || c->closed) {
selected = casi;
break;
}
}
}
If any case is ready, execute it immediately and return.
Phase 5: Block on All Channels
If nothing is ready and block=true, we enqueue on ALL channels:
sudog *sglist = NULL;
for (int i = 0; i < ncases; i++) {
int casi = pollorder[i];
scase *cas = &cas0[casi];
hchan *c = cas->c;
if (c == NULL)
continue;
sudog *sg = acquireSudog();
sg->g = gp;
sg->c = c;
sg->elem = cas->elem;
sg->isSelect = true;
sg->success = false;
sg->ticket = casi; // Remember which case this is
// Link for later cleanup
sg->waitlink = sglist;
sglist = sg;
if (casi < nsends)
waitq_enqueue(&c->sendq, sg);
else
waitq_enqueue(&c->recvq, sg);
}
gp->waiting = sglist;
gopark(selparkcommit, &unlock_arg, waitReasonSelect);
Phase 6: Woken - Find Winner
When woken, one sudog has success=true. Find it and dequeue from all other channels:
// Pass 3: Find winner and dequeue losers
for (sudog *sg = sglist; sg != NULL; sg = sgnext) {
sgnext = sg->waitlink; // Save before we might release
int casi = (int)sg->ticket;
if (sg->success) {
selected = casi;
if (casi >= nsends)
recvOK = true; // Received actual data
} else {
// Remove from wait queue (we won't use this case)
if (casi < nsends)
waitq_remove(&sg->c->sendq, sg);
else
waitq_remove(&sg->c->recvq, sg);
}
}
// Release all sudogs in separate pass
for (sudog *sg = sglist; sg != NULL; sg = sgnext) {
sgnext = sg->waitlink;
releaseSudog(sg);
}
The Default Case
When block=false and nothing is ready, selectgo() returns selected=-1:
if (!block) {
selunlock(cas0, lockorder, ncases);
go_yield(); // Give other goroutines a chance
return (SelectGoResult){-1, false};
}
The go_yield() prevents tight polling loops from starving other goroutines.
Closing Channels
closechan() marks the channel closed and wakes ALL waiting goroutines:
void closechan(hchan *c) {
G *wake_list = NULL;
G *wake_tail = NULL;
chan_lock(c);
if (c->closed) {
chan_unlock(c);
runtime_throw("close of closed channel");
}
c->closed = 1;
// Collect all receivers (they'll get zero values)
while ((sg = waitq_dequeue(&c->recvq)) != NULL) {
sg->success = false; // Indicates closed, not real data
gp = sg->g;
if (!gp || gp->atomicstatus == Gdead)
continue;
if (sg->elem && c->elemsize > 0)
memset(sg->elem, 0, c->elemsize);
// Add gp to wake_list via schedlink...
}
// Collect all senders (they'll panic when they wake)
while ((sg = waitq_dequeue(&c->sendq)) != NULL) {
sg->success = false;
gp = sg->g;
if (!gp || gp->atomicstatus == Gdead)
continue;
// Add gp to wake_list via schedlink...
}
chan_unlock(c);
// Wake everyone outside the lock
while (wake_list) {
gp = wake_list;
wake_list = gp->schedlink;
goready(gp);
}
}
Senders check success when they wake and throw “send on closed channel” if false.
Performance
For benchmark numbers, see the Performance section in Design. You can run the benchmarks yourself with tests/bench_architecture.elf on hardware.
Why Unbuffered is Slower
Unbuffered channels always require a context switch:
Sender Receiver
────── ────────
ch <- 42
│
└── gopark() ─────────────────► scheduler picks receiver
│
x := <-ch
│
◄── goready() ────────────────── wakes sender
│
continues
Buffered channels avoid this when buffer has space/data.
Optimization Tips
- Use buffered channels for producer/consumer patterns
- Power-of-2 buffer sizes for faster indexing (uses bitwise AND instead of modulo)
- Batch data - send structs with multiple values instead of multiple sends
- select with default for non-blocking checks in game loops
- Pre-warm channels - send/receive once during init to allocate sudogs
Limitations
libgodc channels have some constraints:
| Limit | Value | Reason |
|---|---|---|
| Max buffer size | 65536 elements | Sanity check in makechan() |
| Max element size | 65536 bytes | 16-bit elemsize field in hchan |
| Sudog pool | 16 pre-allocated, 128 max | Defined in godc_config.h |
For game code, these limits are rarely hit. If you need larger queues, consider using slices with your own synchronization.