Jetty Threading Architecture

Writing a performant client or server is difficult, because it should:

  • Scale well with the number of processors.

  • Be efficient at using processor caches to avoid parallel slowdown.

  • Support multiple network protocols that may have very different requirements; for example, multiplexed protocols such as HTTP/2 introduce new challenges that are not present in non-multiplexed protocols such as HTTP/1.1.

  • Support different application threading models; for example, if a Jetty server invokes server-side application code that is allowed to call blocking APIs, then the Jetty server should not be affected by how long the blocking API call takes, and should be able to process other connections or other requests in a timely fashion.

Execution Strategies

The Jetty threading architecture can be modeled with a producer/consumer pattern, where produced tasks needs to be consumed efficiently.

For example, Jetty produces (among others) these tasks:

  • A task that wraps a NIO selection event, see the Jetty I/O architecture.

  • A task that wraps the invocation of application code that may block (for example, the invocation of a Servlet to handle an HTTP request).

A task is typically a Runnable object that may implement org.eclipse.jetty.util.thread.Invocable to indicate the behavior of the task (in particular, whether the task may block or not).

Once a task has been produced, it may be consumed using these modes:

Produce-Consume

In the Produce-Consume mode, the producer thread loops to produce a task that is run directly by the Producer Thread.

Diagram

If the task is a NIO selection event, then this mode is the thread-per-selector mode which is very CPU core cache efficient, but suffers from the head-of-line blocking: if one of the tasks blocks or runs slowly, then subsequent tasks cannot be produced (and therefore cannot be consumed either) and will pay in latency the cost of running previous, possibly unrelated, tasks.

This mode should only be used if the produced task is known to never block, or if the system tolerates well (or does not care about) head-of-line blocking.

Produce-Execute-Consume

In the Produce-Execute-Consume mode, the Producer Thread loops to produce tasks that are submitted to a java.util.concurrent.Executor to be run by Worker Threads different from the Producer Thread.

Diagram

The Executor implementation typically adds the task to a queue, and dequeues the task when there is a worker thread available to run it.

This mode solves the head-of-line blocking discussed in the Produce-Consume section, but suffers from other issues:

  • It is not CPU core cache efficient, as the data available to the producer thread will need to be accessed by another thread that likely is going to run on a CPU core that will not have that data in its caches.

  • If the tasks take time to be run, the Executor queue may grow indefinitely.

  • A small latency is added to every task: the time it waits in the Executor queue.

Execute-Produce-Consume

In the Execute-Produce-Consume mode, the producer thread Thread 1 loops to produce a task, then submits one internal task to an Executor to take over production on thread Thread 2, and then runs the task in Thread 1, and so on.

Diagram

This mode may operate like Produce-Consume when the take over production task run, for example, by thread Thread 3 takes time to be executed (for example, in a busy server): then thread Thread 2 will produce one task and run it, then produce another task and run it, etc. — Thread 2 behaves exactly like the Produce-Consume mode. By the time thread Thread 3 takes over task production from Thread 2, all the work might already be done.

This mode may also operate similarly to Produce-Execute-Consume when the take over production task always finds a free CPU core immediately (for example, in a mostly idle server): thread Thread 1 will produce a task, yield production to Thread 2 while Thread 1 is running the task; Thread 2 will produce a task, yield production to Thread 3 while Thread 2 is running the task, etc.

Differently from Produce-Execute-Consume, here production happens on different threads, but the advantage is that the task is run by the same thread that produced it (which is CPU core cache efficient).

Adaptive Execution Strategy

The modes of task consumption discussed above are captured by the org.eclipse.jetty.util.thread.ExecutionStrategy interface, with an additional implementation that also takes into account the behavior of the task when the task implements Invocable.

For example, a task that declares itself as non-blocking can be consumed using the Produce-Consume mode, since there is no risk to stop production because the task will not block.

Conversely, a task that declares itself as blocking will stop production, and therefore must be consumed using either the Produce-Execute-Consume mode or the Execute-Produce-Consume mode. Deciding between these two modes depends on whether there is a free thread immediately available to take over production, and this is captured by the org.eclipse.jetty.util.thread.TryExecutor interface.

An implementation of TryExecutor can be asked whether a thread can be immediately and exclusively allocated to run a task, as opposed to a normal Executor that can only queue the task in the expectation that there will be a thread available in the near future to run the task.

The concept of task consumption modes, coupled with Invocable tasks that expose their own behavior, coupled with a TryExecutor that guarantees whether production can be immediately taken over are captured by the default Jetty execution strategy, named org.eclipse.jetty.util.thread.AdaptiveExecutionStrategy.

AdaptiveExecutionStrategy was previously named EatWhatYouKill, named after a hunting proverb in the sense that one should produce (kill) only what it consumes (eats).

Thread Pool

Jetty’s threading architecture requires a more sophisticated thread pool than what offered by Java’s java.util.concurrent.ExecutorService.

Jetty’s default thread pool implementation is QueuedThreadPool.

QueuedThreadPool integrates with the Jetty component model, implements Executor, provides a TryExecutor implementation (discussed in the adaptive execution strategy section), and supports virtual threads (introduced as a preview feature in Java 19 and Java 20, and as an official feature since Java 21).

Thread Pool Queue

QueuedThreadPool uses a BlockingQueue to store tasks that will be executed as soon as a thread is available.

It is common, but too simplistic, to think that an upper bound to the thread pool queue is a good way to limit the number of concurrent HTTP requests.

In case of asynchronous servers like Jetty, applications may have more than one thread handling a single request. Furthermore, the server implementation may produce a number of tasks that must be run by the thread pool, otherwise the server stops working properly.

Therefore, the "one-thread-per-request" model is too simplistic, and the real model that predicts the number of threads that are necessary is too complicated to produce an accurate value.

For example, a sudden large spike of requests arriving to the server may find the thread pool in an idle state where the number of threads is shrunk to the minimum. This will cause many tasks to be queued up, way before an HTTP request is even read from the network. Add to this that there could be I/O failures processing requests, which may be submitted as a new task to the thread pool. Furthermore, multiplexed protocols like HTTP/2 have a much more complex model (due to data flow control). For multiplexed protocols, the implementation must be able to write in order to progress reads (and must be able to read in order to progress writes), possibly causing more tasks to be submitted to the thread pool.

If any of the submitted tasks is rejected because the queue is bounded the server may grind to a halt, because the task must be executed, sometimes necessarily in a different thread.

For these reasons:

The thread pool queue must be unbounded.

There are better strategies to limit the number of concurrent requests, discussed in this section.

QueuedThreadPool configuration

QueuedThreadPool can be configured with a maxThreads value.

However, some of the Jetty components (such as the selectors) permanently steal threads for their internal use, or rather QueuedThreadPool leases some threads to these components. These threads are reported by QueuedThreadPool.leasedThreads and are not available to run application code.

QueuedThreadPool can be configured with a reservedThreads value. This value represents the maximum number of threads that can be reserved and used by the TryExecutor implementation. A negative value for QueuedThreadPool.reservedThreads means that the actual value will be heuristically derived from the number of CPU cores and QueuedThreadPool.maxThreads. A value of zero for QueuedThreadPool.reservedThreads means that reserved threads are disabled, and therefore the Execute-Produce-Consume mode is never used — the Produce-Execute-Consume mode is always used instead.

QueuedThreadPool always maintains the number of threads between QueuedThreadPool.minThreads and QueuedThreadPool.maxThreads; during load spikes the number of thread grows to meet the load demand, and when the load on the system diminishes or the system goes idle, the number of threads shrinks.

Shrinking QueuedThreadPool is important in particular in containerized environments, where typically you want to return the memory occupied by the threads to the operative system. The shrinking of the QueuedThreadPool is controlled by two parameters: QueuedThreadPool.idleTimeout and QueuedThreadPool.maxEvictCount.

QueuedThreadPool.idleTimeout indicates how long a thread should stay around when it is idle, waiting for tasks to execute. The longer the threads stay around, the more ready they are in case of new load spikes on the system; however, they consume resources: a Java platform thread typically allocates 1 MiB of native memory.

QueuedThreadPool.maxEvictCount controls how many idle threads are evicted for one QueuedThreadPool.idleTimeout period. The larger this value is, the quicker the threads are evicted when the QueuedThreadPool is idle or has less load, and their resources returned to the operative system; however, large values may result in too much thread thrashing: the QueuedThreadPool shrinks too fast and must re-create a lot of threads in case of a new load spike on the system.

A good balance between QueuedThreadPool.idleTimeout and QueuedThreadPool.maxEvictCount depends on the load profile of your system, and it is often tuned via trial and error.

Virtual Threads

Virtual threads have been introduced in Java 19 and Java 20 as a preview feature, and have become an official feature since Java 21.

In Java versions where virtual threads are a preview feature, remember to add --enable-preview to the JVM command line options to use virtual threads.

QueuedThreadPool can be configured to use virtual threads by specifying the virtual threads Executor:

QueuedThreadPool threadPool = new QueuedThreadPool();
threadPool.setVirtualThreadsExecutor(Executors.newVirtualThreadPerTaskExecutor());

Server server = new Server(threadPool);

Jetty cannot enforce that the Executor passed to setVirtualThreadsExecutor(Executor) uses virtual threads, so make sure to specify a virtual threads Executor and not a normal Executor that uses platform threads.

AdaptiveExecutionStrategy makes use of this setting when it determines that a task should be run with the Produce-Execute-Consume mode: rather than submitting the task to QueuedThreadPool to be run in a platform thread, it submits the task to the virtual threads Executor.

Enabling virtual threads in QueuedThreadPool will default the number of reserved threads to zero, unless the number of reserved threads is explicitly configured to a positive value.

Defaulting the number of reserved threads to zero ensures that the Produce-Execute-Consume mode is always used, which means that virtual threads will always be used for blocking tasks.