Design Principles¶

1. Zero Dynamic Allocation in Steady State¶

All memory is pre-allocated during initialization. The hot path (data processing loop) never calls malloc, new, free, or delete.

/* WRONG: Allocation in hot path */
void process_frame(frame_t *f) {
    uint8_t *buf = malloc(f->len);  // ❌ Non-deterministic
    // ...
    free(buf);
}

/* RIGHT: Pre-allocated pool */
void process_frame(frame_t *f, memory_pool_t *pool) {
    uint8_t *buf = memory_pool_alloc(pool);  // ✅ O(1), lock-free
    // ...
    memory_pool_free(pool, buf);  // ✅ O(1), lock-free
}

Why: malloc has unbounded worst-case latency (page faults, lock contention, fragmentation). In real-time systems, a single allocation can add milliseconds of jitter.

2. Lock-Free Concurrency¶

Shared data between threads uses atomic operations with explicit memory ordering never mutexes or condition variables in the data path.

/* Memory ordering rules */
atomic_store_explicit(&head, new_val, memory_order_release);  // Producer
atomic_load_explicit(&head, memory_order_acquire);             // Consumer

Ordering Cheat Sheet¶

Pattern	Producer Store	Consumer Load
SPSC Queue	`release`	`acquire`
Publish/Subscribe	`release`	`acquire`
Sequence counter	`release`	`acquire`
Cross-atomic sync	`seq_cst`	`seq_cst`

3. Cache-Line Isolation¶

Shared structures separate producer and consumer data onto different cache lines (64 bytes on x86-64) to prevent false sharing:

┌─────────────────────────────────────────────────────────┐
│  Cache Line 0 (64B)          Cache Line 1 (64B)         │
│  ┌──────────────────┐        ┌──────────────────┐       │
│  │ head (producer)  │        │ tail (consumer)  │       │
│  │ + padding[56B]   │        │ + padding[56B]   │       │
│  └──────────────────┘        └──────────────────┘       │
└─────────────────────────────────────────────────────────┘

typedef struct {
    alignas(64) _Atomic uint64_t head;
    uint8_t _pad0[64 - sizeof(_Atomic uint64_t)];

    alignas(64) _Atomic uint64_t tail;
    uint8_t _pad1[64 - sizeof(_Atomic uint64_t)];
} spsc_indices_t;

4. Power-of-Two Sizing¶

Ring buffers and queues use power-of-2 capacities to replace expensive modulo operations with bitwise AND:

/* SLOW: modulo (involves division) */
uint64_t index = position % capacity;

/* FAST: bitwise AND (single cycle) */
uint64_t index = position & mask;  // mask = capacity - 1

5. Direct System Calls¶

No wrapper libraries (Boost.Asio, libevent, etc.). Direct POSIX/Linux system calls:

Operation	We Use	Not This
Network I/O	`socket()`, `epoll_wait()`	Boost.Asio
Serial	`open()`, `termios`, `ioctl()`	libserial
Timers	`timerfd_create()`	`sleep()` loops
Memory mapping	`mmap()`	`fread()`/`fwrite()`
CPU affinity	`sched_setaffinity()`	OS scheduler

6. Batch Processing¶

Process all available events per syscall wake-up, amortizing the syscall cost:

/* Batch drain: process ALL ready events */
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) {
    handle_event(&events[i]);  // No syscalls inside
}

7. Odd Parity & CRC Integrity¶

Every protocol that transmits data includes integrity checking:

Protocol	Integrity Method	Detection
ARINC 429	Odd parity (1 bit)	Single-bit errors
Modbus RTU	CRC-16 (polynomial 0xA001)	Burst errors ≤16 bits
ARINC 615A	CRC-32 (ISO 3309)	Burst errors ≤32 bits
AFDX	IP header checksum + FCS	Full frame integrity
CAN	CRC-15 + bit stuffing	5+ bit errors

8. Fail-Safe Defaults¶

Buffers initialized to zero
Parity/CRC computed on encode, verified on decode
Hardware timeouts on every I/O operation
Graceful degradation: functions return error codes, never abort()