STM: State-Threaded Multi-Processing Module for Apache/2.0

Mike Abbott
Accelerating Apache Project

Contents

1. State Threads, Pthreads, and Apache/2.0

State threads are simple, fast, and scalable threads ideally suited for use in Apache/2.0. They combine the simplicity of the multithreaded programming paradigm, in which one thread supports one client connection, with the performance and scalability of an event-driven state machine architecture such as the Zeus web server. In other words, state threads offer a threading API for structuring an Internet application like Apache/2.0 as a state machine.

Version 2.0 of the Apache HTTP Server introduces multi-processing modules (MPMs) which, on Unix-like platforms, mix processes and threads. Processes provide fault containment and recovery. An error, crash, or resource leak affects only one process not the whole server, and processes are easy to replace when they fault. Threads provide scalability and ease of programming. A server can maintain a large number of threads, with one thread per client connection.

The MPMs for Unix-like platforms use POSIX pthreads. Pthreads are well-known, widely available, and more or less standardized, and they perform and scale adequately for a large class of programming problems. However, they are not the right choice for Apache/2.0.

Programming pthreads is complex. Not only Apache developers but also independent module writers and patch contributors must design and code around race conditions, deadlocks, and data corruption, and must use only thread-safe (reentrant) library routines. Debugging pthreaded applications can be tricky. Furthermore, pthreads implementations differ sufficiently to complicate programming and hinder portability, and pthreads are either available for a given platform or not. Very few people are going to write a complete pthreads library for their system if there isn't one already present.

Pthreads can be slow. They are preemptive and concurrent and always share data, mandating cycle-sapping mutex locking. Libraries sometimes use locks to protect hidden resources (such as malloc/free) which can be difficult or inefficient to work around.

Pthreads scale poorly in the presence of blocking I/O, which proliferates kernel execution vehicles and slows kernel-level scheduling. (Non-blocking I/O with pthreads also suffers because select still blocks the kernel thread.)

State threads are simple to use because they are non-preemptive and non-concurrent. They need no mutual exclusion locking and can use all the static variables and non-reentrant library functions they want. Programmers need not worry about race conditions or deadlocks. State threads are available for many Unix-like platforms and since the complete library is open source it can support additional platforms easily.

State threads are fast because they schedule entirely in user mode and since mutex locking is unnecessary they waste no time contending for locks.

State threads are scalable because they take advantage of hardware concurrency (such as multiple CPUs) without interfering with each other at all by running in separate processes -- which also provides fault containment.

State threads are the best threads for Apache/2.0 because they provide exactly the abstractions around which Apache/2.0 is designed -- one lightweight scheduling entity per client connection, fault containment and recovery, high performance and scalability -- and add simplicity of programming, ease of debugging, and the promise of extensive portability, without suffering from the complexity of more traditional thread packages.

2. The STM and STIOL

The state-threaded multi-processing module (STM MPM) for Apache/2.0 uses a multi-process, multi-threaded (MPMT) architecture similar to the standard MPMT MPMs except that it uses state threads instead of pthreads. The number of processes is constant and the number of threads per process varies with the load against the server. In state thread lingo, each process is a virtual processor (VP) managing its own independent set of state threads. Each state thread listens to, and processes connections from, exactly one listening socket.

The STM's one-thread-per-socket design is different from the standard MPMs, in which all threads listen to all sockets simultaneously. This subtly changes the meaning of some configuration directives (such as MinSpareThreads).

In addition to MPMs Apache/2.0 introduces I/O layers. The STM includes its own state-threaded I/O layer called STIOL, which simply directs all I/O operations on sockets to the state thread library's I/O routines, a crucial and required part of using state threads.

Informal performance measurements reveal that the STM is faster than any other standard MPM on all the systems it currently supports. In particular on a dual Pentium III Linux 2.4 system the STM is 25% faster than the dexter MPM, which uses a similar MPMT architecture with pthreads.

It is a credit to the Apache developers that integrating state threads into Apache/2.0 was easy and (mostly) clean.

2.1. VP Binding

Virtual processors (processes) may be bound to particular CPUs to improve cache utilization. Cache utilization is further improved when VP binding combines with listen binding and the interrupts from the network interfaces bound to a VP are bound to the same CPU as the VP.

The STM's VPBind configuration directive binds a VP to a specific CPU.

2.2. Listen Binding

Virtual processors may listen to all or a subset of all the listen sockets to reduce contention on each socket. Cache utilization also is improved when listen binding combines with VP binding and the interrupts from the network interfaces bound to a VP are bound to the same CPU as the VP.

When only one VP listens to each socket, accept serialization is no longer necessary, further improving performance. Note that this does not imply that a VP must listen to only one socket.

The STM's VPListen configuration directive binds listeners (also called accepting or listening sockets) to a VP.

3. User's Guide

The STM comes in the form of one or more patches against Apache/2.0 from the open source Accelerating Apache Project. It requires the state threads library, which comes in the form of a complete distribution from the open source State Threads Project. First download all the software (Apache, patches, library), apply the patches, and build and install the state threads library.

Unfortunately the state threads library's makefile has no install target. When building the STM the compiler, the linker, and some helper tools need to be able to find st.h and libst.so -- which httpd itself also needs, once built -- so you must either copy these files manually to the appropriate locations to "install" them, like this:

  % su
  # cp /path/to/state-threads/platform/st.h /usr/include
  # cp /path/to/state-threads/platform/libst.so /usr/lib
  # exit
or set CPPFLAGS, LDFLAGS, and LD_LIBRARY_PATH in your environment to point to them, like this:
  % setenv CPPFLAGS -I/path/to/state-threads/platform
  % setenv LDFLAGS -L/path/to/state-threads/platform
  % setenv LD_LIBRARY_PATH /path/to/state-threads/platform

Once that's done, to use the STM you must only specify --with-mpm=stm to configure. All of what follows is optional, although it's a good idea to try out the STM with STM_DEBUG defined at first, to catch internal inconsistencies, especially on new platforms. Please report errors and contribute fixes.

3.1. Compilation Options

The STM and STIOL support these compile-time configuration options. To specify these options, define them with CPPFLAGS to configure, like this:

  % env CPPFLAGS="-DSTM_LISTENER_LIMIT=2 -DSTM_DEBUG -DIRIX=65" \
  configure --with-mpm=stm ...

To view the options used to compile Apache, run:

  % httpd -V
IRIX The version of SGI's Irix operating system, encoded as a two-digit number, to take advantage of certain high-performance features. For example, for Irix 6.2 use IRIX=62, and for Irix 6.5.x use IRIX=65. Use only on Irix systems.

Note: The presence of this option does not mean that the features it enables are available only on Irix, just that the STM leverages them currently only on Irix. Other operating systems could have their own similar options if someone contributes the code.

Default: IRIX is not defined.

STIOL_DEBUG Enables internal consistency checks and debugging features within the STIOL. Only the presence or absence of this token is meaningful; the value is ignored.

Default: STIOL_DEBUG is not defined so debugging is disabled.

STM_DEBUG Enables internal consistency checks and debugging features within the STM. Only the presence or absence of this token is meaningful; the value is ignored.

Default: STM_DEBUG is not defined so debugging is disabled.

STM_LISTENER_LIMIT Maximum number of listeners (Listen or VPListen directives, also known as accepting or listening sockets).

Default: 8

STM_SCORE_KEY_SIZE Maximum length in bytes of the key string in the scoreboards' connection status tables, including the string-terminating null. Keys longer than this will be truncated. It's probably a good idea to make STM_SCORE_KEY_SIZE + STM_SCORE_VALUE_SIZE equal to one or more times the size of your system's largest cache line.

Default: 16

STM_SCORE_LIMIT Maximum number of key/value pairs in the scoreboards' connection status tables. When the table fills additional insertions are ignored.

Default: 16

STM_SCORE_VALUE_SIZE Maximum length in bytes of the value string in the scoreboards' connection status tables, including the string-terminating null. Values longer than this will be truncated. It's probably a good idea to make STM_SCORE_KEY_SIZE + STM_SCORE_VALUE_SIZE equal to one or more times the size of your system's largest cache line.

Default: 48

STM_ST_LIMIT Maximum number of state threads per virtual processor (upper bound on MaxThreads). The server-status page reports the maximum number of threads actually used at any time, which can be used to tune this value optimally for your system and workload. However, lingering closes can artificially inflate the reported maximum.

Default: 512

STM_VP_LIMIT Maximum number of virtual processors.

Default: 8

USE_ST_TIME Enables more efficient but less accurate time-of-day queries. Only the presence or absence of this token is meaningful; the value is ignored.

Apache/2.0 queries the current time of day with one-microsecond resolution upon receipt of each request. One-second resolution usually suffices and is more efficient to query under heavy load (because it uses the state threads library's time cache). This option switches to one-second resolution.

Combine with a nonzero value for ThreadConnections to ensure one-second accuracy.

Default: USE_ST_TIME is not defined so timekeeping is not accelerated.

3.2. Configuration Directives

The STM supports these run-time configuration directives, specified in the file conf/httpd.conf. To invoke these directives edit the STM section of that file, like this:

  <IfModule stm.c>
  NumVPs                  4
  StartThreads            8
  MinSpareThreads         8
  MaxSpareThreads         32
  MaxThreads              64
  StackSize               65536
  ThreadConnections       0
  VPConnections           0
  VPBind                  0 0
  VPListen                0 100.100.100.1:80
  VPBind                  1 0
  VPListen                1 100.100.101.1:80
  VPBind                  2 1
  VPListen                2 100.100.102.1:80
  VPBind                  3 1
  VPListen                3 100.100.103.1:80
  ConnectionStatus        on
  ScoreboardDir           scoreboards
  # CoreDir               cores
  </IfModule>

The above example is for a two-CPU system with four network interfaces. On this system the interrupts from interface 100.100.100.1 are bound to CPU 0, 100.100.102.1 to CPU 1, and so on.

ConnectionStatus Flag enabling or disabling per-connection status in scoreboards (they can be slow to maintain).

Default: on

CoreDir Path of a directory where Apache will dump core if necessary. The directory should exist before Apache startup.

Default: server root directory

MaxSpareThreads Maximum number of spare state threads per socket -- not per VP. Spare threads are threads awaiting a connection (as opposed to processing requests on established connections). Must be between 1 and STM_ST_LIMIT inclusive. Should be between 1 and MaxThreads divided by the number of listen sockets inclusive. Making this too small can cause long latency for clients; too big and you waste CPU cycles on low-volume servers.

Default: 10

MaxThreads Maximum number of state threads -- spare or busy -- per VP across all sockets. Must be between 1 and STM_ST_LIMIT inclusive. Making this too small can cause long latency for clients, especially if your site is largely dynamic (like CGI); too big and you waste memory.

Default: 64

MinSpareThreads Minimum number of spare state threads per socket -- not per VP. Spare threads are threads awaiting a connection (as opposed to processing requests on established connections). Must be between 1 and STM_ST_LIMIT inclusive. Should be between 1 and MaxThreads divided by the number of listen sockets inclusive. Making this too small can cause long latency for clients; too big and you waste CPU cycles on low-volume servers.

Default: 5

NumVPs Number of virtual processors. Must be between 1 and STM_VP_LIMIT inclusive. Only one or a few are needed per CPU. The server starts a replacement when a VP dies so the number running is constant. Making this too small can cause long latency for clients; too big and you waste CPU cycles and memory.

Default: 4

PidFile Path of the file where Apache writes the process ID (PID) of the master process. The file may or may not exist before startup.

Default: server-root-directory/logs/httpd.pid

ScoreboardDir Path of the directory where the STM creates scoreboard files for each VP. The directory must already exist before Apache startup. The scoreboard files themselves may or may not pre-exist and have names that are just the ordinal of the VP.

Default: server-root-directory/scoreboards

StackSize Size in bytes of each thread's stack. Must be between the system's page size and INT_MAX inclusive. Making this too small can cause stack overflow errors; too big and you waste memory.

Default: 65536 (64 KB)

StartThreads Number of state threads each VP should start initially per socket -- not per VP. Must be between 1 and STM_ST_LIMIT inclusive. Should be between 1 and MaxThreads divided by the number of listen sockets inclusive. Generally this value should be the same as MinSpareThreads but it can be higher to facilitate Apache restarts on high-volume servers.

Default: 5

VPBind Binding VPs to CPUs.

Syntax: VPBind VP-ID CPU-ID
where VP-ID is the number of the virtual processor, from 0 to NumVPs - 1 inclusive, and CPU-ID is the number of the CPU to which to bind that VP, from 0 to the number of CPUs in your system - 1, inclusive. You may have multiple VPBind directives to bind some or all of the VPs. If more than one VPBind directive specifies the same VP-ID the last one overrides all the previous ones.

Default: VPBind is not used so no binding is in effect.

ThreadConnections Number of connections (not requests) each thread should serve before yielding the VP to another thread. Must be between 0 and INT_MAX inclusive. Making this too small can waste CPU cycles; too big (or 0) can cause long latency for clients and also inaccurate time stamps in log files (with USE_ST_TIME). The value 0 means the number of connections is unlimited.

Default: 0

VPConnections Number of connections (not requests) each VP should serve before exiting gracefully and allowing the master process to replace it to reclaim resources such as memory or file descriptors leaked by faulty modules. Must be between 0 and INT_MAX inclusive. Making this too small can waste CPU cycles; too big (or 0) and leaked resources can interfere with normal system operation. The value 0 means the number of connections is unlimited and VPs should never voluntarily exit.

Default: 0

VPListen Binding listeners to VPs.

Syntax: VPListen VP-ID IP-address:port ...
where VP-ID is the number of the virtual processor, from 0 to NumVPs - 1 inclusive, and IP-address:port specifies the network address (in dot notation) and port number to which that VP should listen, separated by a colon (:). You may specify more than one address and port. You may have multiple VPListen directives. If more than one VPListen directive specifies the same VP-ID they all take effect.

VPListen and Listen directives cannot be mixed. If any VP has listeners bound to it, then all such bindings must be explicit and every VP must have at least one listener bound to it.

Default: VPListen is not used so no binding is in effect.

3.3. Debugging Aids

Compiling the STM with STM_DEBUG defined enables internal consistency checks and use of its debugging features. Compiling with STIOL_DEBUG defined enables use of the STIOL's debugging features. To activate these features, define the variable in the environment in which Apache will run, like this:

  % env STM_INTERACTIVE=1 STM_TRACE=1 bin/httpd ...

By default, all of the following are not defined and therefore their debugging features are deactivated.

ONE_PROCESS Limits the STM to use only one process, to simplify use of a debugger. This feature overrides the NumVPs configuration directive. Normally the STM uses one process per VP plus one master process.

Use this feature only for debugging as it degrades performance and disables fault recovery.

STIOL_TRACE Causes the STIOL to print lots of debugging information, including one line at the beginning of each procedure call within the STIOL.

Use with STM_INTERACTIVE to see all the output, otherwise output will stop when the server detaches from the controlling terminal.

Use this feature only for debugging as it severely degrades performance.

STM_INTERACTIVE Forces the STM to run interactively, that is, attached to the controlling terminal and session for normal standard I/O and keyboard signalling, not in the background detached from I/O with the terminal as during normal operation.
STM_TRACE Causes the STM to print lots of debugging information, including one line at the beginning of each procedure call within the STM.

Use with STM_INTERACTIVE to see all the output, otherwise output will stop when the server detaches from the controlling terminal.

Use this feature only for debugging as it severely degrades performance.

3.4. Monitoring

Scoreboards contain status information and counters. Each virtual processor maintains its own scoreboard. Scoreboards are kept in memory-mapped disk files (as opposed to anonymous shared memory) so that support tools such as stmstat (included with STM) can read scoreboard data without causing Apache to serve a page. Most of the scoreboard data is also available on the server-status page from mod_status.

Unfortunately, at this time (alpha 4) the programming interface to mod_status is limited to key-value pairs about client connections identified by a single number, and the page it generates has a format fixed by this interface. For the STM to report information not directly related to connections it resorts to using negative connection numbers so the resulting page is ugly. The author hopes to extend this interface someday. Or you could do it and contribute your changes.

This is what the STM's status information looks like, annotated. The output from the stmstat tool is similar but less polished. The configuration file for this server includes the example from the section on configuration directives.

Apache Server Status for myserver

Server Version: Apache/2.0a6 (Unix) (10xpatchlevel 2.0a6-3) STM/1.0
Server Built: Aug 22 2000 17:15:14

Current Time: Wednesday, 23-Aug-2000 13:29:39 PDT
87 connections currently being processed

Connection -1

Total connections served
130573
Most threads used at once
39
Total connections served for 100.100.100.1:80
31798
Total connections served for 100.100.101.1:80
32986
Total connections served for 100.100.102.1:80
33115
Total connections served for 100.100.103.1:80
32674

Connection -1 is a summary of all STM activity across all virtual processors and state threads. This section shows the total number of client connections processed since server start and also the number per listener (accepting socket), and the maximum number of threads ever used on any single VP (useful for fine-tuning STM_ST_LIMIT).

Connection -1000

Process ID
7008
Start time
Wed Aug 23 13:12:15 2000
Incarnations
1
Bound to CPU
0
Thread starts
39
Thread exits
0
Most threads used at once
39
Connections served, prior incarnations
0
Connections served, this incarnation
31798
Connections served for 100.100.100.1:80
31798
Connections served by thread 0
843
Connections served by thread 1
805
Connections served by thread 2
1085

... threads 3-62 elided for brevity ...

Connections served by thread 63
0

Connection -1000 is a summary of activity for virtual processor #0. More generally, info for VP #n comes under the heading for connection -1000 - n. The process ID and start time are self-explanatory. Incarnations reports the number of different VP's inhabiting this slot, that is, the number of times a VP has been started to replace a dead one, including the initial VP. (The STM restarts VPs when the die, which can be voluntarily by exceeding its VPConnections limit or unexpectedly by crashing.) The next item reports the CPU to which the VP is actually bound, or -1 if no binding was requested or successfully performed. Next comes thread activity and connection activity. Connections served is split by incarnations so you can see how many connections remain before the VP reaches its VPConnections limit. Connections served also is broken down by listener (this VP happens to listen to only one socket) and by thread, to give a really detailed understanding of server activity.

Connection 0

Status
Writing
Method
GET
Protocol
HTTP/1.0

Connection 4

Status
Writing
Method
GET
Protocol
HTTP/1.0

Connection 8

Status
Writing
Method
GET
Protocol
HTTP/1.0

... a bunch of normal connection status elided for brevity ...

Connection 35

Status
Writing
Method
GET
Protocol
HTTP/1.0

These are the connections that VP #0 is currently processing. These are present only when ConnectionStatus is on.

Connection -1001

Process ID
7011
Start time
Wed Aug 23 13:12:15 2000
Incarnations
1
Bound to CPU
0
Thread starts
39
Thread exits
0
Most threads used at once
39
Connections served, prior incarnations
0
Connections served, this incarnation
32986
Connections served for 100.100.101.1:80
32986
Connections served by thread 0
957
Connections served by thread 1
765
Connections served by thread 2
747

... threads 3-62 elided for brevity ...

Connections served by thread 63
0

Connection -1000 is a summary of activity for VP #1. This VP happens to listen to a different address than VP #0.

Connection 64

Status
Writing
Method
GET
Protocol
HTTP/1.0

Connection 66

Status
Writing
Method
GET
Protocol
HTTP/1.0

... a bunch of normal connection status elided for brevity ...

Connection 102

Status
Writing
Method
GET
Protocol
HTTP/1.0

These are the connections that VP #1 is currently processing.

Connection -1002

Process ID
7009
Start time
Wed Aug 23 13:12:15 2000
Incarnations
1
Bound to CPU
1
Thread starts
39
Thread exits
0
Most threads used at once
39
Connections served, prior incarnations
0
Connections served, this incarnation
33115
Connections served for 100.100.102.1:80
33115
Connections served by thread 0
674
Connections served by thread 1
821
Connections served by thread 2
923

... threads 3-62 elided for brevity ...

Connections served by thread 63
0

Connection 128

Status
Writing
Method
GET
Protocol
HTTP/1.0

Connection 130

Status
Writing
Method
GET
Protocol
HTTP/1.0

... a bunch of normal connection status elided for brevity ...

Connection 164

Status
Writing
Method
GET
Protocol
HTTP/1.0

This is VP #2.

Connection -1003

Process ID
7012
Start time
Wed Aug 23 13:12:15 2000
Incarnations
1
Bound to CPU
1
Thread starts
41
Thread exits
2
Most threads used at once
39
Connections served, prior incarnations
0
Connections served, this incarnation
32674
Connections served for 100.100.103.1:80
32674
Connections served by thread 0
617
Connections served by thread 1
656
Connections served by thread 2
1111

... threads 3-62 elided for brevity ...

Connections served by thread 63
0

Connection 193

Status
Writing
Method
GET
Protocol
HTTP/1.0

Connection 194

Status
Writing
Method
GET
Protocol
HTTP/1.0

... a bunch of normal connection status elided for brevity ...

Connection 227

Status
Writing
Method
GET
Protocol
HTTP/1.0

This is VP #3.

4. Further Information

Additional information is available from these sources.

In the on-line version of this document, links to other parts of the Apache server documentation may not work.

Portions created by SGI are Copyright © 2000 Silicon Graphics, Inc. All rights reserved.