Design Principles¶
1. Zero Dynamic Allocation in Steady State¶
All memory is pre-allocated during initialization. The hot path (data processing loop) never calls malloc, new, free, or delete.
/* WRONG: Allocation in hot path */
void process_frame(frame_t *f) {
uint8_t *buf = malloc(f->len); // ❌ Non-deterministic
// ...
free(buf);
}
/* RIGHT: Pre-allocated pool */
void process_frame(frame_t *f, memory_pool_t *pool) {
uint8_t *buf = memory_pool_alloc(pool); // ✅ O(1), lock-free
// ...
memory_pool_free(pool, buf); // ✅ O(1), lock-free
}
Why: malloc has unbounded worst-case latency (page faults, lock contention, fragmentation). In real-time systems, a single allocation can add milliseconds of jitter.
2. Lock-Free Concurrency¶
Shared data between threads uses atomic operations with explicit memory ordering never mutexes or condition variables in the data path.
/* Memory ordering rules */
atomic_store_explicit(&head, new_val, memory_order_release); // Producer
atomic_load_explicit(&head, memory_order_acquire); // Consumer
Ordering Cheat Sheet¶
| Pattern | Producer Store | Consumer Load |
|---|---|---|
| SPSC Queue | release | acquire |
| Publish/Subscribe | release | acquire |
| Sequence counter | release | acquire |
| Cross-atomic sync | seq_cst | seq_cst |
3. Cache-Line Isolation¶
Shared structures separate producer and consumer data onto different cache lines (64 bytes on x86-64) to prevent false sharing:
┌─────────────────────────────────────────────────────────┐
│ Cache Line 0 (64B) Cache Line 1 (64B) │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ head (producer) │ │ tail (consumer) │ │
│ │ + padding[56B] │ │ + padding[56B] │ │
│ └──────────────────┘ └──────────────────┘ │
└─────────────────────────────────────────────────────────┘
typedef struct {
alignas(64) _Atomic uint64_t head;
uint8_t _pad0[64 - sizeof(_Atomic uint64_t)];
alignas(64) _Atomic uint64_t tail;
uint8_t _pad1[64 - sizeof(_Atomic uint64_t)];
} spsc_indices_t;
4. Power-of-Two Sizing¶
Ring buffers and queues use power-of-2 capacities to replace expensive modulo operations with bitwise AND:
/* SLOW: modulo (involves division) */
uint64_t index = position % capacity;
/* FAST: bitwise AND (single cycle) */
uint64_t index = position & mask; // mask = capacity - 1
5. Direct System Calls¶
No wrapper libraries (Boost.Asio, libevent, etc.). Direct POSIX/Linux system calls:
| Operation | We Use | Not This |
|---|---|---|
| Network I/O | socket(), epoll_wait() | Boost.Asio |
| Serial | open(), termios, ioctl() | libserial |
| Timers | timerfd_create() | sleep() loops |
| Memory mapping | mmap() | fread()/fwrite() |
| CPU affinity | sched_setaffinity() | OS scheduler |
6. Batch Processing¶
Process all available events per syscall wake-up, amortizing the syscall cost:
/* Batch drain: process ALL ready events */
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) {
handle_event(&events[i]); // No syscalls inside
}
7. Odd Parity & CRC Integrity¶
Every protocol that transmits data includes integrity checking:
| Protocol | Integrity Method | Detection |
|---|---|---|
| ARINC 429 | Odd parity (1 bit) | Single-bit errors |
| Modbus RTU | CRC-16 (polynomial 0xA001) | Burst errors ≤16 bits |
| ARINC 615A | CRC-32 (ISO 3309) | Burst errors ≤32 bits |
| AFDX | IP header checksum + FCS | Full frame integrity |
| CAN | CRC-15 + bit stuffing | 5+ bit errors |
8. Fail-Safe Defaults¶
- Buffers initialized to zero
- Parity/CRC computed on encode, verified on decode
- Hardware timeouts on every I/O operation
- Graceful degradation: functions return error codes, never
abort()