Skip to content

TCP Stack (Userspace State Machine)

Status: Planned

This module is not yet implemented. The design below represents the target architecture.

Overview

A minimal userspace TCP implementation built on top of raw_socket for applications that need deterministic latency without kernel TCP's buffering, congestion window management, and scheduling overhead. Implements the TCP state machine (RFC 793) with selective acknowledgment (SACK, RFC 2018).

TCP State Machine

stateDiagram-v2
    [*] --> CLOSED
    CLOSED --> SYN_SENT : connect() / send SYN
    CLOSED --> LISTEN : listen()
    LISTEN --> SYN_RCVD : recv SYN / send SYN+ACK
    SYN_SENT --> ESTABLISHED : recv SYN+ACK / send ACK
    SYN_RCVD --> ESTABLISHED : recv ACK
    ESTABLISHED --> FIN_WAIT_1 : close() / send FIN
    ESTABLISHED --> CLOSE_WAIT : recv FIN / send ACK
    FIN_WAIT_1 --> FIN_WAIT_2 : recv ACK
    FIN_WAIT_1 --> CLOSING : recv FIN / send ACK
    FIN_WAIT_2 --> TIME_WAIT : recv FIN / send ACK
    CLOSING --> TIME_WAIT : recv ACK
    TIME_WAIT --> CLOSED : 2MSL timeout
    CLOSE_WAIT --> LAST_ACK : close() / send FIN
    LAST_ACK --> CLOSED : recv ACK

Planned API

typedef enum {
    TCP_CLOSED, TCP_LISTEN, TCP_SYN_SENT, TCP_SYN_RCVD,
    TCP_ESTABLISHED, TCP_FIN_WAIT_1, TCP_FIN_WAIT_2,
    TCP_CLOSING, TCP_TIME_WAIT, TCP_CLOSE_WAIT, TCP_LAST_ACK,
} tcp_state_t;

typedef struct {
    tcp_state_t state;
    uint32_t    local_ip;
    uint32_t    remote_ip;
    uint16_t    local_port;
    uint16_t    remote_port;
    uint32_t    seq_num;         /* Send sequence number */
    uint32_t    ack_num;         /* Expected receive sequence */
    uint32_t    window_size;     /* Advertised window */
    uint64_t    rtt_ns;          /* Smoothed RTT */
    raw_socket_t *raw;           /* Underlying raw socket */
} tcp_conn_t;

/* Connection lifecycle */
int tcp_listen(tcp_conn_t *conn, uint16_t port);
int tcp_connect(tcp_conn_t *conn, uint32_t remote_ip, uint16_t remote_port);
int tcp_accept(tcp_conn_t *listener, tcp_conn_t *new_conn);
int tcp_close(tcp_conn_t *conn);

/* Data transfer */
ssize_t tcp_send(tcp_conn_t *conn, const void *data, size_t len);
ssize_t tcp_recv(tcp_conn_t *conn, void *buf, size_t len, int timeout_ms);

/* Event processing (call from reactor) */
int tcp_process_packet(tcp_conn_t *conn, const void *pkt, size_t len);

TCP Header Format

 0                   1                   2                   3
 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
├─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┼─┤
│          Source Port          │       Destination Port          │
├─────────────────────────────────────────────────────────────────┤
│                        Sequence Number                           │
├─────────────────────────────────────────────────────────────────┤
│                    Acknowledgment Number                         │
├─────┬───────┬─┬─┬─┬─┬─┬─┬─────────────────────────────────────┤
│Data │       │U│A│P│R│S│F│                                       │
│Offset│ Rsvd │R│C│S│S│Y│I│           Window Size                 │
│     │       │G│K│H│T│N│N│                                       │
├─────────────────────────────────────────────────────────────────┤
│         Checksum              │       Urgent Pointer            │
├─────────────────────────────────────────────────────────────────┤
│                    Options (variable)                            │
└─────────────────────────────────────────────────────────────────┘

Performance Targets

Metric Target Notes
3-way handshake < 10 µs On loopback / LAN
Send latency < 3 µs Application to wire
Receive latency < 3 µs Wire to application callback
Retransmit timeout Configurable Microsecond-resolution timers

Implementation Roadmap

  • TCP header construction and parsing
  • Connection state machine (RFC 793)
  • 3-way handshake (SYN, SYN+ACK, ACK)
  • Sliding window flow control
  • Retransmission timer (timerfd integration)
  • Selective ACK (SACK, RFC 2018)
  • Connection teardown (FIN/RST)
  • Integration with epoll_reactor
  • Checksum offload (if supported by NIC)