Jetty Threading Architecture

Writing a performant client or server is difficult, because it should:

Scale well with the number of processors.
Be efficient at using processor caches to avoid parallel slowdown.
Support multiple network protocols that may have very different requirements; for example, multiplexed protocols such as HTTP/2 introduce new challenges that are not present in non-multiplexed protocols such as HTTP/1.1.
Support different application threading models; for example, if a Jetty server invokes server-side application code that is allowed to call blocking APIs, then the Jetty server should not be affected by how long the blocking API call takes, and should be able to process other connections or other requests in a timely fashion.

Execution Strategies

The Jetty threading architecture can be modeled with a producer/consumer pattern, where produced tasks needs to be consumed efficiently.

For example, Jetty produces (among others) these tasks:

A task that wraps a NIO selection event, see the Jetty I/O architecture.
A task that wraps the invocation of application code that may block (for example, the invocation of a Servlet to handle an HTTP request).

A task is typically a Runnable object that may implement org.eclipse.jetty.util.thread.Invocable to indicate the behavior of the task (in particular, whether the task may block or not).

Once a task has been produced, it may be consumed using these modes:

Produce-Consume
Produce-Execute-Consume
Execute-Produce-Consume

Produce-Consume

In the Produce-Consume mode, the producer thread loops to produce a task that is run directly by the Producer Thread.

If the task is a NIO selection event, then this mode is the thread-per-selector mode which is very CPU core cache efficient, but suffers from the head-of-line blocking: if one of the tasks blocks or runs slowly, then subsequent tasks cannot be produced (and therefore cannot be consumed either) and will pay in latency the cost of running previous, possibly unrelated, tasks.

This mode should only be used if the produced task is known to never block, or if the system tolerates well (or does not care about) head-of-line blocking.

Produce-Execute-Consume

In the Produce-Execute-Consume mode, the Producer Thread loops to produce tasks that are submitted to a java.util.concurrent.Executor to be run by Worker Threads different from the Producer Thread.

The Executor implementation typically adds the task to a queue, and dequeues the task when there is a worker thread available to run it.

This mode solves the head-of-line blocking discussed in the Produce-Consume section, but suffers from other issues:

It is not CPU core cache efficient, as the data available to the producer thread will need to be accessed by another thread that likely is going to run on a CPU core that will not have that data in its caches.
If the tasks take time to be run, the Executor queue may grow indefinitely.
A small latency is added to every task: the time it waits in the Executor queue.

Execute-Produce-Consume

In the Execute-Produce-Consume mode, the producer thread Thread 1 loops to produce a task, then submits one internal task to an Executor to take over production on thread Thread 2, and then runs the task in Thread 1, and so on.

This mode may operate like Produce-Consume when the take over production task run, for example, by thread Thread 3 takes time to be executed (for example, in a busy server): then thread Thread 2 will produce one task and run it, then produce another task and run it, etc. — Thread 2 behaves exactly like the Produce-Consume mode. By the time thread Thread 3 takes over task production from Thread 2, all the work might already be done.

This mode may also operate similarly to Produce-Execute-Consume when the take over production task always finds a free CPU core immediately (for example, in a mostly idle server): thread Thread 1 will produce a task, yield production to Thread 2 while Thread 1 is running the task; Thread 2 will produce a task, yield production to Thread 3 while Thread 2 is running the task, etc.

Differently from Produce-Execute-Consume, here production happens on different threads, but the advantage is that the task is run by the same thread that produced it (which is CPU core cache efficient).

Adaptive Execution Strategy

The modes of task consumption discussed above are captured by the org.eclipse.jetty.util.thread.ExecutionStrategy interface, with an additional implementation that also takes into account the behavior of the task when the task implements Invocable.

For example, a task that declares itself as non-blocking can be consumed using the Produce-Consume mode, since there is no risk to stop production because the task will not block.

Conversely, a task that declares itself as blocking will stop production, and therefore must be consumed using either the Produce-Execute-Consume mode or the Execute-Produce-Consume mode. Deciding between these two modes depends on whether there is a free thread immediately available to take over production, and this is captured by the org.eclipse.jetty.util.thread.TryExecutor interface.

An implementation of TryExecutor can be asked whether a thread can be immediately and exclusively allocated to run a task, as opposed to a normal Executor that can only queue the task in the expectation that there will be a thread available in the near future to run the task.

The concept of task consumption modes, coupled with Invocable tasks that expose their own behavior, coupled with a TryExecutor that guarantees whether production can be immediately taken over are captured by the default Jetty execution strategy, named org.eclipse.jetty.util.thread.AdaptiveExecutionStrategy.

AdaptiveExecutionStrategy was previously named EatWhatYouKill, named after a hunting proverb in the sense that one should produce (kill) only what it consumes (eats).

Thread Pool

Jetty’s threading architecture requires a more sophisticated thread pool than what offered by Java’s java.util.concurrent.ExecutorService.

Jetty’s default thread pool implementation is QueuedThreadPool.

QueuedThreadPool integrates with the Jetty component model, implements Executor, provides a TryExecutor implementation (discussed in the adaptive execution strategy section), and supports virtual threads (introduced as a preview feature in Java 19 and Java 20, and as an official feature since Java 21).

Thread Pool Queue

QueuedThreadPool uses a BlockingQueue to store tasks that will be executed as soon as a thread is available.

It is common, but too simplistic, to think that an upper bound to the thread pool queue is a good way to limit the number of concurrent HTTP requests.

In case of asynchronous servers like Jetty, applications may have more than one thread handling a single request. Furthermore, the server implementation may produce a number of tasks that must be run by the thread pool, otherwise the server stops working properly.

Therefore, the "one-thread-per-request" model is too simplistic, and the real model that predicts the number of threads that are necessary is too complicated to produce an accurate value.

For example, a sudden large spike of requests arriving to the server may find the thread pool in an idle state where the number of threads is shrunk to the minimum. This will cause many tasks to be queued up, way before an HTTP request is even read from the network. Add to this that there could be I/O failures processing requests, which may be submitted as a new task to the thread pool. Furthermore, multiplexed protocols like HTTP/2 have a much more complex model (due to data flow control). For multiplexed protocols, the implementation must be able to write in order to progress reads (and must be able to read in order to progress writes), possibly causing more tasks to be submitted to the thread pool.

If any of the submitted tasks is rejected because the queue is bounded the server may grind to a halt, because the task must be executed, sometimes necessarily in a different thread.

For these reasons:

The thread pool queue must be unbounded.

There are better strategies to limit the number of concurrent requests, discussed in this section.

`QueuedThreadPool` configuration

QueuedThreadPool can be configured with a maxThreads value.

However, some of the Jetty components (such as the selectors) steal threads for their internal use, or rather QueuedThreadPool leases some threads to these components. These threads are reported by QueuedThreadPool.leasedThreads and are not available to run application code.

QueuedThreadPool can be configured with a reservedThreads value. This value represents the maximum number of threads that can be reserved and used by the TryExecutor implementation. A negative value for QueuedThreadPool.reservedThreads means that the actual value will be heuristically derived from the number of CPU cores and QueuedThreadPool.maxThreads. A value of zero for QueuedThreadPool.reservedThreads means that reserved threads are disabled, and therefore the Execute-Produce-Consume mode is never used — the Produce-Execute-Consume mode is always used instead.

QueuedThreadPool leases threads to the following Jetty components:

The Connectors; for each connector a thread is leased for each acceptor and each selector.
The TryExecutor implementation; for the default implementation ReservedThreadExecutor, QueuedThreadPool.reservedThreads are leased.

To carefully control the minimum number of threads in QueuedThreadPool, you must explicitly specify the number of selectors and the number of acceptors for each Connector, and the number of reserved threads.

For example:

QueuedThreadPool threadPool = new QueuedThreadPool();
// Configure the number of reserved threads.
int reservedThreads = 2;
threadPool.setReservedThreads(reservedThreads);

Server server = new Server(threadPool);

int acceptors = 1;
int selectors = 3;
ServerConnector connector = new ServerConnector(server, acceptors, selectors);
server.addConnector(connector);

In the example above, 6 threads will be leased: 2 reserved threads, 1 acceptor thread and 3 selector threads.

To reduce the number of leased threads, you can configure the acceptors to 0, the selectors to 1 and the reserved threads to 0.

Setting the acceptors to 0 results in TCP connections being accepted in non-blocking mode, which may increase the load on the selector, and possibly in increased latency.

Setting the reserved threads to 0 results in the Produce-Execute-Consume mode to be used instead of the Execute-Produce-Consume mode.

You have to try and benchmark what is the best configuration for your use case.

QueuedThreadPool always maintains the number of threads between QueuedThreadPool.minThreads and QueuedThreadPool.maxThreads; during load spikes the number of thread grows to meet the load demand, and when the load on the system diminishes or the system goes idle, the number of threads shrinks.

The number of leased threads must be taken into account to configure QueuedThreadPool.maxThreads. During Server startup, QueuedThreadPool leases threads to the Jetty components, and there must be enough threads for these components, and at least 1 more thread to handle requests. In the example above, QueuedThreadPool.maxThreads must be set to 7 or more.

An alternative to explicitly setting the configuration of the Jetty components to which threads are leased, you can leave their configuration at default so that it is calculated by internal Jetty heuristics, and then adjust QueuedThreadPool.maxThreads afterwards:

// Use the default thread pool configuration.
QueuedThreadPool threadPool = new QueuedThreadPool();
Server server = new Server(threadPool);
// Use the default ServerConnector configuration.
ServerConnector connector = new ServerConnector(server);
server.addConnector(connector);
// Start the server, so threads are leased to components
// following the heuristics based on configuration and hardware.
server.start();

// Calculate a new maxThreads for the thread pool.

// Retrieve the number of leased threads.
int maxLeasedThreads = threadPool.getMaxLeasedThreads();
// Choose the number of threads available to run application code.
int availableThreads = 42;
// The new value for maxThreads.
int maxThreads = maxLeasedThreads + availableThreads;
threadPool.setMaxThreads(maxThreads);

Do not try to reduce the value of QueuedThreadPool.maxThreads to the minimum possible.

Jetty may require threads to run internal tasks, and if no threads are available, because they are leased or busy handling requests, the server performance may degrade.

QueuedThreadPool can shrink the number of threads when they are not used, as explained below, so it is typically safe to leave QueuedThreadPool.maxThreads to its default, or to a value calculated using the max number of leased threads L, plus the number of concurrent threads to handle requests H, so that QueuedThreadPool.maxThreads=L+H.

Consider to choose a generous value for H, for example one that takes into account load spikes, since QueuedThreadPool shrinking will eventually remove threads that are unused.

Shrinking QueuedThreadPool is important in particular in containerized environments, where typically you want to return the memory occupied by the threads to the operative system. The shrinking of the QueuedThreadPool is controlled by two parameters: QueuedThreadPool.idleTimeout and QueuedThreadPool.maxEvictCount.

QueuedThreadPool.idleTimeout indicates how long a thread should stay around when it is idle, waiting for tasks to execute. The longer the threads stay around, the more ready they are in case of new load spikes on the system; however, they consume resources: a Java platform thread typically allocates 1 MiB of native memory.

QueuedThreadPool.maxEvictCount controls how many idle threads are evicted for one QueuedThreadPool.idleTimeout period. The larger this value is, the quicker the threads are evicted when the QueuedThreadPool is idle or has less load, and their resources returned to the operative system; however, large values may result in too much thread thrashing: the QueuedThreadPool shrinks too fast and must re-create a lot of threads in case of a new load spike on the system.

A good balance between QueuedThreadPool.idleTimeout and QueuedThreadPool.maxEvictCount depends on the load profile of your system, and it is often tuned via trial and error.

Virtual Threads

Virtual threads have been introduced in Java 19 and Java 20 as a preview feature, and have become an official feature since Java 21.

In Java versions where virtual threads are a preview feature, remember to add --enable-preview to the JVM command line options to use virtual threads.

Virtual Threads Support with `QueuedThreadPool`

QueuedThreadPool can be configured to use virtual threads by specifying the virtual threads Executor:

QueuedThreadPool threadPool = new QueuedThreadPool();

// Simple, unlimited, virtual thread Executor.
threadPool.setVirtualThreadsExecutor(Executors.newVirtualThreadPerTaskExecutor());

// Configurable, bounded, virtual thread executor (preferred).
VirtualThreadPool virtualExecutor = new VirtualThreadPool();
virtualExecutor.setMaxThreads(128);
threadPool.setVirtualThreadsExecutor(virtualExecutor);

// For server-side usage.
Server server = new Server(threadPool);

// Simple client-side usage.
HttpClient client = new HttpClient();
client.setExecutor(threadPool);

// Client-side usage with explicit HttpClientTransport.
ClientConnector clientConnector = new ClientConnector();
clientConnector.setExecutor(threadPool);
HttpClient httpClient = new HttpClient(new HttpClientTransportOverHTTP(clientConnector));

Jetty cannot enforce that the Executor passed to setVirtualThreadsExecutor(Executor) uses virtual threads, so make sure to specify a virtual threads Executor and not a normal Executor that uses platform threads.

AdaptiveExecutionStrategy makes use of this setting when it determines that a task should be run with the Produce-Execute-Consume mode: rather than submitting the task to QueuedThreadPool to be run in a platform thread, it submits the task to the virtual threads Executor.

Enabling virtual threads in QueuedThreadPool will default the number of reserved threads to zero, unless the number of reserved threads is explicitly configured to a positive value.

Defaulting the number of reserved threads to zero ensures that the Produce-Execute-Consume mode is always used, which means that virtual threads will always be used for blocking tasks.

Virtual Threads Support with `VirtualThreadPool`

VirtualThreadPool is an alternative to QueuedThreadPool that creates only virtual threads (no platform threads).

VirtualThreadPool threadPool = new VirtualThreadPool();
// Limit the max number of current virtual threads.
threadPool.setMaxThreads(200);
// Track, with details, virtual threads usage.
threadPool.setTracking(true);
threadPool.setDetailedDump(true);

// For server-side usage.
Server server = new Server(threadPool);

// Simple client-side usage.
HttpClient client = new HttpClient();
client.setExecutor(threadPool);

// Client-side usage with explicit HttpClientTransport.
ClientConnector clientConnector = new ClientConnector();
clientConnector.setExecutor(threadPool);
HttpClient httpClient = new HttpClient(new HttpClientTransportOverHTTP(clientConnector));

Despite the name, VirtualThreadPool does not pool virtual threads, but allows you to impose a limit on the maximum number of current virtual threads, using a Semaphore.

Limiting the number of current virtual threads helps to limit resource usage in applications, especially in case of load spikes. When an unlimited number of virtual threads is allowed, the server might be brought down due to resource (typically memory) exhaustion.

Furthermore, you can configure it to track virtual threads so that a Jetty component dump will show all virtual threads currently in use, including those that are unmounted.

Virtual Threads Pinning

Even when using virtual threads, Jetty uses non-blocking I/O, and dedicates a thread to each java.nio.channels.Selector to perform the Selector.select() operation.

Up to Java 23, calling Selector.select() from a virtual thread pins the carrier thread.

Since Java 24, calling Selector.select() from a virtual thread does not pin the carrier thread.

Virtual Threads Pinning with Java 19 to 23

If you configure a server-side Connector, or Jetty’s HttpClient with N selectors, then N carrier threads will be pinned by the virtual threads calling Selector.select().

If you have less than N CPU cores in your system, then by default all carriers will be pinned in the Selector.select() call, leaving no carrier to execute virtual threads, and therefore completely locking up your system, which will become completely unresponsive.

If you have more than N CPU cores in your system, then by default your system may be less efficient, since the carrier threads may be pinned in the Selector.select() call, and therefore not available to run virtual threads.

The number of CPU cores of your system determines, by default, the number of carrier threads. The number of carrier threads may be explicitly configured by setting the system property -Djdk.virtualThreadScheduler.parallelism=N, where N is your desired number of carrier threads.

Selector threads used by Jetty pin carrier threads.

Choose the number of selectors wisely when using virtual threads: the number of selectors must always be less than the number of carrier threads, to leave some of the carrier threads free to run virtual threads.

As an extreme example, if your system only has 1 CPU core, then 1 selector is enough to pin the only carrier thread, and your system will eventually lock up. In this case, you must explicitly configure the number of carrier threads by setting the system property -Djdk.virtualThreadScheduler.parallelism=2 (or to a larger value).