Skip to content

Design Principles

1. Zero Dynamic Allocation in Steady State

All memory is pre-allocated during initialization. The hot path (data processing loop) never calls malloc, new, free, or delete.

/* WRONG: Allocation in hot path */
void process_frame(frame_t *f) {
    uint8_t *buf = malloc(f->len);  // ❌ Non-deterministic
    // ...
    free(buf);
}

/* RIGHT: Pre-allocated pool */
void process_frame(frame_t *f, memory_pool_t *pool) {
    uint8_t *buf = memory_pool_alloc(pool);  // ✅ O(1), lock-free
    // ...
    memory_pool_free(pool, buf);  // ✅ O(1), lock-free
}

Why: malloc has unbounded worst-case latency (page faults, lock contention, fragmentation). In real-time systems, a single allocation can add milliseconds of jitter.

2. Lock-Free Concurrency

Shared data between threads uses atomic operations with explicit memory ordering never mutexes or condition variables in the data path.

/* Memory ordering rules */
atomic_store_explicit(&head, new_val, memory_order_release);  // Producer
atomic_load_explicit(&head, memory_order_acquire);             // Consumer

Ordering Cheat Sheet

Pattern Producer Store Consumer Load
SPSC Queue release acquire
Publish/Subscribe release acquire
Sequence counter release acquire
Cross-atomic sync seq_cst seq_cst

3. Cache-Line Isolation

Shared structures separate producer and consumer data onto different cache lines (64 bytes on x86-64) to prevent false sharing:

┌─────────────────────────────────────────────────────────┐
│  Cache Line 0 (64B)          Cache Line 1 (64B)         │
│  ┌──────────────────┐        ┌──────────────────┐       │
│  │ head (producer)  │        │ tail (consumer)  │       │
│  │ + padding[56B]   │        │ + padding[56B]   │       │
│  └──────────────────┘        └──────────────────┘       │
└─────────────────────────────────────────────────────────┘
typedef struct {
    alignas(64) _Atomic uint64_t head;
    uint8_t _pad0[64 - sizeof(_Atomic uint64_t)];

    alignas(64) _Atomic uint64_t tail;
    uint8_t _pad1[64 - sizeof(_Atomic uint64_t)];
} spsc_indices_t;

4. Power-of-Two Sizing

Ring buffers and queues use power-of-2 capacities to replace expensive modulo operations with bitwise AND:

/* SLOW: modulo (involves division) */
uint64_t index = position % capacity;

/* FAST: bitwise AND (single cycle) */
uint64_t index = position & mask;  // mask = capacity - 1

5. Direct System Calls

No wrapper libraries (Boost.Asio, libevent, etc.). Direct POSIX/Linux system calls:

Operation We Use Not This
Network I/O socket(), epoll_wait() Boost.Asio
Serial open(), termios, ioctl() libserial
Timers timerfd_create() sleep() loops
Memory mapping mmap() fread()/fwrite()
CPU affinity sched_setaffinity() OS scheduler

6. Batch Processing

Process all available events per syscall wake-up, amortizing the syscall cost:

/* Batch drain: process ALL ready events */
int n = epoll_wait(epfd, events, MAX_EVENTS, -1);
for (int i = 0; i < n; i++) {
    handle_event(&events[i]);  // No syscalls inside
}

7. Odd Parity & CRC Integrity

Every protocol that transmits data includes integrity checking:

Protocol Integrity Method Detection
ARINC 429 Odd parity (1 bit) Single-bit errors
Modbus RTU CRC-16 (polynomial 0xA001) Burst errors ≤16 bits
ARINC 615A CRC-32 (ISO 3309) Burst errors ≤32 bits
AFDX IP header checksum + FCS Full frame integrity
CAN CRC-15 + bit stuffing 5+ bit errors

8. Fail-Safe Defaults

  • Buffers initialized to zero
  • Parity/CRC computed on encode, verified on decode
  • Hardware timeouts on every I/O operation
  • Graceful degradation: functions return error codes, never abort()