Skip to content

Zero-Copy I/O

Status: Planned

This module is not yet implemented. The design below represents the target architecture.

Overview

Zero-copy data transfer mechanisms that eliminate kernel-to-userspace memory copies. Covers four complementary approaches: mmap, splice/sendfile, io_uring, and MSG_ZEROCOPY.

Target Mechanisms

┌──────────────────────────────────────────────────────────────────────┐
│                      Zero-Copy Techniques                             │
├─────────────┬──────────────────────────────────────────────────────┤
│   mmap      │  Map kernel pages into userspace (PACKET_MMAP, file)  │
│   splice    │  Pipe-based kernel-to-kernel transfer (no userspace)   │
│   io_uring  │  Async I/O with shared submission/completion rings     │
│ MSG_ZEROCOPY│  Socket send from userspace pages (kernel pins pages)  │
└─────────────┴──────────────────────────────────────────────────────┘

Architecture

graph LR
    A[Application Buffer] -->|mmap| B[Kernel Page Cache]
    B -->|splice| C[Socket Buffer]
    A -->|MSG_ZEROCOPY| C
    A -->|io_uring SQE| D[io_uring Ring]
    D -->|Kernel| C
    C --> E[NIC DMA]

Planned API

io_uring Interface

typedef struct {
    int ring_fd;
    struct io_uring_sqe *sq_ring;  /* Submission queue (shared mmap) */
    struct io_uring_cqe *cq_ring;  /* Completion queue (shared mmap) */
    uint32_t sq_size;
    uint32_t cq_size;
    uint32_t sq_tail;              /* Next submission slot */
    uint32_t cq_head;             /* Next completion to read */
} zero_copy_ring_t;

int zero_copy_ring_init(zero_copy_ring_t *ring, uint32_t queue_depth);
int zero_copy_submit_send(zero_copy_ring_t *ring, int fd,
                          const void *buf, size_t len);
int zero_copy_submit_recv(zero_copy_ring_t *ring, int fd,
                          void *buf, size_t len);
int zero_copy_complete(zero_copy_ring_t *ring, int *result);
void zero_copy_ring_destroy(zero_copy_ring_t *ring);

splice/sendfile Interface

/* Kernel-to-kernel zero-copy file → socket transfer */
ssize_t zero_copy_sendfile(int out_fd, int in_fd, off_t offset, size_t count);

/* Pipe-based splice for stream processing */
ssize_t zero_copy_splice(int fd_in, int fd_out, size_t len, unsigned int flags);

MSG_ZEROCOPY Socket Send

/* Enable MSG_ZEROCOPY on a socket */
int zero_copy_socket_enable(int sockfd);

/* Send with zero-copy (kernel pins user pages, signals completion) */
ssize_t zero_copy_send(int sockfd, const void *buf, size_t len);

/* Poll for zerocopy completion notifications */
int zero_copy_poll_completion(int sockfd, uint32_t *completed_id);

Performance Targets

Mechanism Latency Throughput Kernel Version
splice ~2 µs 10+ Gbps 2.6.17+
sendfile ~2 µs 10+ Gbps 2.2+
io_uring < 1 µs 10+ Gbps 5.1+
MSG_ZEROCOPY ~5 µs 10+ Gbps 4.14+

io_uring Submission/Completion Ring Layout

┌───────────────────────────────────────────────────────────────────┐
│  io_uring shared memory (mmap'd between kernel and userspace):     │
│                                                                    │
│  Submission Queue (SQ):                                            │
│  ┌───────┬───────┬───────┬───────┬───────────────────────┐        │
│  │ SQE 0 │ SQE 1 │ SQE 2 │ SQE 3 │       ...             │        │
│  └───────┴───────┴───────┴───────┴───────────────────────┘        │
│    ↑ tail (user writes)                                            │
│                                                                    │
│  Completion Queue (CQ):                                            │
│  ┌───────┬───────┬───────┬───────┬───────────────────────┐        │
│  │ CQE 0 │ CQE 1 │ CQE 2 │ CQE 3 │       ...             │        │
│  └───────┴───────┴───────┴───────┴───────────────────────┘        │
│    ↑ head (user reads)                                             │
│                                                                    │
│  Flow: user writes SQE → kernel processes → kernel writes CQE      │
│  No syscall needed for submission (SQPOLL mode)                     │
└───────────────────────────────────────────────────────────────────┘

Dependencies

Dependency Version Purpose
Linux kernel ≥ 5.1 io_uring
Linux kernel ≥ 4.14 MSG_ZEROCOPY
liburing ≥ 2.0 io_uring userspace helpers

Implementation Roadmap

  • io_uring ring setup and teardown
  • io_uring async send/recv with fixed buffers
  • splice-based file-to-socket transfer
  • MSG_ZEROCOPY socket send with completion polling
  • Benchmark suite (vs. standard read/write)
  • Integration with epoll_reactor