Spin Version 5.0 Multi-core Options


Spin Version 5 supports options to perform model checking on multi-core machines.
  • The working of Spin 5.0 is unchanged for single-core verifications (the default).
    This means that if you do not explicitly enable a multi-core verification, Spin will work as before.
  • The multi-core options are designed for true multi-core machines (not large multi-cpu shared memory machines with NUMA architectures). The multi-core mode tends to work best for large verification models, but not for small models of a few million states or less. The multi-core options can in the best cases give a near linear speedup with the number of cpu-cores used. In the worst case, though, the added overhead can also cause a slowdown. Sometimes one has to experiment to find the right settings to achieve a true speedup (follow, for instance, the hints for recompilation that the verifier sometimes prints at the end of a run).
  • The multi-core option allows one to compile the pan.c verifiers with extra directives (e.g., -DNCORE=4) that define the number of cpu-cores that will be used during the model checking process. There no point in setting the number of cores larger than the number of real cpu-cores on your system. It will, for instance, generally cause a slow-down if you compile pan.c for dual-core when you only have a single-core system, etc.
  • A multi-core verification run uses shared memory segments that are pre-allocated before all worker processes are spawned. This is different from the default single-core behavior, where memory is allocated on the fly, as needed. This means that you generally have to define the amount of memory that should be made available for the complete run (using -DMEMLIM=N where N is a memory size in Megabytes).
  • As before, you should never set the memory limit to a larger value than the amount of physical (not virtual) memory that is available on your system. Note that the maximum amount of memory that your operating system makes available to shared memory segments may be small (e.g., 32 Mbyte on Cygwin) -- see below for ways to increase this to a more realistic number.
The algorithms used for this version of Spin are described in these papers. A first set of measurements of the performance of Spin on multi-core machines with up to 8 cores is also available online.

Currently supported platforms include Windows, Linux, and Cygwin systems, and possibly also Mac OS/X.
The following specifics apply to each platform:

  • Linux (32 and 64 bit recommended)
    • Gnu gcc, both 32-bit and 64-bit versions.
  • Windows (32 bit recommended)
    • using the (freely available) Visual Studio Express 2008 compiler for native Windows support
      The 32-bit version is reliable, but the 64-bit compiler cannot currently be recommended.
      With the 64-bit Windows compiler shared memory segments can inexplicably get corrupted or disconnected.
  • Cygwin (not recommended)
    • Gnu gcc, using cygserver. At present, no 64-bit version of cygserver is available or expected, so multi-core verification in this mode will is restricted to 32-bit mode. Recent versions of cygwin/cygserver also appear to put additional (seemingly arbitrarily) low memory bounds on shared memory segments (e.g., 482 MB), which limits its use for multi-core verifications.
  • Mac OS/X
    • This platform has not been thoroughly tested, but early beta-users have reported that the multi-core options work.
The most stable, and recommended, platform for the Spin multi-core extension is either 32-bit or 64-bit Linux. Most of our development and testing was done under Ubuntu 7.0.4, with gcc version 4.1.2, using 64-bit compilation.
The instructions below are also available from the compiled version of pan.c (provided it is compiled in multicore mode with -DNCORE=N with N>1), by typing:
	./pan --

Setup for Linux (64-bit recommended)

The maximum size of a shared memory segment is defined in /usr/include/linux/shm.h. The default size of a shared segment after a reboot is 32Mb. Before doing multi-core verifications you should change this value with the /sbin/sysctl command (for which you will need superuser privileges).
To check the currently used value:
         $ /sbin/sysctl -a | grep shmmax   # check what the current value is
         kernel.shmmax = 33554432          # the default value
Change this to close to the size of physical memory on your system. To change the value:
         $ /sbin/sysctl kernel.shmmax=6442450944        # set new limit of 6 Gb
         kernel.shmmax = 6442450944			# done
or
         $ /sbin/sysctl kernel.shmmax=32212254720 	# 30 Gbyte
         kernel.shmmax = 32212254720			# done
(Note that you cannot increase the value above 4 GByte on a 32-bit system.)

You should also increase the maximum number of shared pages shmall to the number used above divided by the pagesize (e.g., 4096), for instance:

	$ /sbin/sysctl kernel.shmall=7864320 		# for a 30 Gbyte shared memory max
See further under "New Directives."

Setup for 32-bit or 64-bit Windows

You will need to install the Visual Studio Express C compiler, (e.g. VC2005 or VC2008, which can be downloaded from the Microsoft website), and set up the environment for either 64-bit or 32-bit compilations. The configuration can be a bit of a discovery process, especially for 64-bit systems if you want to be able to call the cl compiler from the command line. In addition to the compiler proper, you will also need to install the Platform SDK toolkit.
	To use PSDK with VC2005 Express:
	1. Create a directory "PlatformSDK" under "\ProgramFiles\Microsoft Visual Studio 8\VC"
	2. Copy the "bin", "include" and "lib" directory from the PSDK-installation
	   \Program Files\Microsoft Platform SDK\ into this directory.
	Then you do not need to change any config file.
The 32-bit Windows compiler VC2005 or VC2008 is free, but not the 64-bit Windows compiler.
(Although you can download a trial version -- which is a very large file. The use of this compiler is not recommended though.)

The following settings for environment variables worked on a Windows XP64 system, but will almost certainly need to be changed for different systems or compiler installations. It is provided here as a example only. Caveat emptor.

	INCLUDE='C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Include;C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Include\crt;C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Include\crt\sys;C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Include\mfc;C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Include\atl'

	LIB='C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Lib\AMD64;C:\Program Files\Microsoft Platform SDK for Windows Server 2003 R2\Lib\AMD64\atlmfc;'

	PATH='/cygdrive/c/cygwin/bin:/cygdrive/c/Program Files/Microsoft Platform SDK for Windows Server 2003 R2/Bin/Win64/x86/AMD64:/cygdrive/c/Program Files/Microsoft Platform SDK for Windows Server 2003 R2/Bin:/cygdrive/c/Program Files/Microsoft Platform SDK for Windows Server 2003 R2/Bin/WinNT:/cygdrive/c/WINDOWS/system32:/cygdrive/c/WINDOWS:/cygdrive/c/WINDOWS/System32/Wbem:/cygdrive/c/PROGRA~2/QuickTime/QTSystem/:/cygdrive/c/Program Files (x86)/Microsoft SQL Server/90/Tools/bin/'

	export PATH INCLUDE LIB
To actually compile a program, you will need a command line such as:
	$ cl -nologo hello.c bufferoverflowu.lib
(Note, the explicit listing of the bufferoverflowu.lib file which is required for the 64-bit compiler only -- the compiler will know where to find it.)
The use of the 64-bit compiler under Windows is currently not recommended.

With the standard 32-bit compiler, you will note that the header file stdint.h is missing from the regular Visual Studio installation. The file can be found in Wikipedia, in the entry for stdint.h, and should be place in the include directory, e.g., for VC2008 this is:

	C:/Program\ Files/Microsoft\ Visual\ Studio\ 9.0/VC/include

Setup for 32-bit Cygwin under Windows

Edit the value of kern.ipc.shmmaxpgs in /etc/cygserver.conf . The default value is 8,192 and the maximum is 32,767. Multiply the number set by the pagesize (e.g., 4096) to get the maximum size of a shared memory segment that corresponds to this. The default size is 32MB and the maximum is 128MB. (On more recent versions, it seems to be impossible to allocate more than about 484 MB of shared memory, which is a significant limitation.)

If cygserver is not running, it should be started first:

        $ sh /usr/bin/cygserver-config		# the very first time only
        $ cygrunsrv -S cygserver		# once or repeat when it dies
Finally, a shell variable CYGWIN=server should be set in the environment. Assumign a bash our bourne shell, you'd do:
	$ echo $CYGSERVER	# check
	$			# not set
	$ CYGSERVER=server	# set it
	$ export CYGSERVER
It is of course best to set the CYGSERVER variable in your profile, or as part of the standard environment setting.

You can tell if cygserver is running by executing:

        $ ipcs   # print a list of all current shared memory uses
        Message Queues:
        T     ID               KEY        MODE       OWNER    GROUP
        
        Shared Memory:
        T     ID               KEY        MODE       OWNER    GROUP
        m  65536                    0 --rwarwarwa       gh    Users
        
        Semaphores:
        T     ID               KEY        MODE       OWNER    GROUP
If this command fails the cygserver is not running and must be (re)started (see above).

You can use ipcs to see which shared memory segments are in use, and if any stale segments are left over from an aborted run of the model checker (this shouldn't happen, but occassionally it may). To manually remove a stale shared memory segment, use ipcrm
You can also use a script to cleanup after a failed run, possibly with some of the pan processes from a multi-core run lingering. This is optional, every new run of the verifier will try to cleanup any stale segments from previous runs, and a reboot will also cleanup any stale segments. For instance:

	$ cat /bin/doclean
	#!/bin/sh

	# adapt this script to the specific output format of ps
	# and ipcs on your system
	
	ps | sed 1d | grep pan$ | awk '	{ print $1 }' > __id
	for i in `cat __id`
	do	# echo kill stale pan process $i
		kill -9 $i
	done
	
	ipcs | sed 1d | awk '$1=="m"	{ print $2 }' > __id
	
	for i in `cat __id`
	do	# echo remove shared memory segment $i
		ipcrm -m $i
	done
	
	rm -f __id	# cleanup temporary file

Directives Supported for Multi-core Verification

In multi-core mode, Spin will have to pre-allocate the maximum allowable amount of shared memory at the start of the run. It is therefore recommended to always us the directive MEMLIM:
	-DMEMLIM=8000	# allow up to 8 GB of shared memory
Other directives, new to Spin version 5.0 are,the first few being most important. The following list applies specificatlly to version 5.1.2 and later:
	-DDUALCORE  --> same as -DNCORE=2
	-DQUADCORE  --> same as -DNCORE=4
	-DNCORE=N   --> enables multi_core verification if N>1
Additional directives supported in multi-core mode:
	-DSEP_STATE --> forces separate statespaces instead of a single shared state space
	-DUSE_DISK  --> use disk for storing states when a work queue overflows
	-DMAX_DSK_FILE=N --> max nr of states per diskfile (default: 1000000)
	-DFULL_TRAIL --> support full error trails (but increases memory use)
Normally, and probably the recommended way, to do a first compilation and multi-core run for a verification model, would be to decide on the amount of memory, and the number of available core and then execute, e.g.:
	$ spin -a model.pml	# as before
	$ gcc -DNCORE=3 -DMEMLIM=8000 -DSAFETY -o pan pan.c
	$ ./pan
And let ./pan tell you what if anything would need changing in the compilation. For instance, to increase the number of states that can be stored in the shared queues that connect the different CPU cores (default values are in parentheses):
	-DVMAX=N    --> upperbound on statevector for handoffs (default N=1024)
	-DPMAX=N    --> upperbound on nr of procs (default N=64)
	-DQMAX=N    --> upperbound on nr of channels (default N=64)
As noted, it is safe to do a default run first, and let ./pan recommend a setting for the above three directives, if the defaults are not right, rather than trying to guess what they should be up front.

To change the size of the global workqueue (rarely needed):

	-DSET_WQ_SIZE=N --> default 128 (Mbytes)
To force the use of a single global heap, rather than separate heaps for shared memory per core (usually not helpful):
	-DGLOB_HEAP
To define a function to initialize data before spawning processes (must be enclosed in quotes to protect the round braces from the shell):
	"-DC_INIT=fct()"
Timer settings etc. for termination and crash detection (the default settings are shown below). Should very rarely need changing:
	-DSHORT_T=0.1 --> short timeout for termination detection trigger
	-DLONG_T=600  --> long timeout for giving up on termination detection
	-DONESECOND=(1<<29) --> timeout waiting for a free slot
	-DT_ALERT     --> collect stats on crash alert timeouts

Safety Verification

Recommended use is to use -DSAFETY -DNCORE=N with N equal to the number of physical cores on your system, or lower, but never larger. Do not use -DSEP_STATE in this mode (it will lead to redundant work being done unnecessarily.)

Liveness Verification

Compile with -DNCORE=2 -DSEP_STATE. Using a larger number of cores than 2 is not useful with the exisiting implementation, since the multi-core liveness algorithm is fundamentally dual-core. Since we know that the two statespaces generated for the first and the second depth-first search are disjoint, the use of -DSEP_STATE is recommended, although it can somewhat increase memory use. Omit -DSEP_STATE only if there isn't sufficient memory.

Tuning a Verification run

At the end of a verification run, ./pan will print some statistics. Many more are printed if you run in verbose mode by adding a -v to the parameters. Included in verbose mode is the number of internal ticks spent waiting for incoming states and free slots in work queues. If the latter is non-zero, it may be advisable to enlarge the global workqueue (-DSET_WQ_SIZE), and/or to enable disk swapping (-DUSE_DISK).

Another statistic printed is the number of states that is explored by each CPU core. If the numbers look uneven -- you can experiment with different handoff depths. The default handoff depth is 20, but you can give ./pan a command-line argument to override this. To set a handoff depth to 100, for instance, use:

	$ ./pan -z100
(For liveness verification, the actual handoff depth is fixed to accepting states, but the -z argument still changes the number of slots in the circular handoff queues used.)

Usually, the verification run is not very sensitive to the absolute value of the handoff depth chosen. When large numbers of cores are used, or when the maximum search depth is relatively small, it is probably wise not to also use a large value for the handoff-depth, to make sure that each CPU core can get sufficient work.

To see all supported options, use:

	$ ./pan --
In rare cases a run may abort due to timeouts, even when the run itself isn't really finished. In this case, you could try to give -DSHORT_T a larger value. (Since there is some hardware speed dependence in this setting, this may also be necessary on very fast machines -- although this is unlikely.)

Full Counter-Examples

By default, ./pan when run in multi-core mode will generate a short counter-example that contains only the last few steps before an error is found. All counter-examples must be replayed with ./pan itself (you can no longer use spin -t for this). For instance:
	$ ./pan -r
If the counter-example is too short -- you can try a run with a larger handoff depth to see if that triggers a longer part of the error trail. To get a full counter-example that starts from the global initial system state, you must compile with -DFULL_TRAIL, for instance:
	$ gcc -DNCORE=32 -DFULL_TRAIL -o pan pan.c
This increases the memory requirements to store the extra information needed for the full error-trails (a stack-tree), but will generate a traditional counter-example that can be replayed with ./pan -r as well as with spin -t (unless embedded C-code is used in the model of course -- in that case you can only replay the counter-example with ./pan -r as before).

Other New Directives

Two other new compile time directives for pan.c files were introduced in Version 5.0. Their use is not related to multi-core verification, but have a more general purpose. They are:
	-DFREQ=N	--> changes the frequency of snapshot updates during a run
			    from once every 1000000 states to once every N states
	-DNSUCC		--> computes the average fanout of states in the statespace
			    and prints a summary at the end of the run
The remaining directives are for testing only, and should probably not be used for routine applications:
	-DLOOPSTATE	--> testing only -- enables a failed attempt to enforce a
			    variant of a partial order cycle proviso based on local
			    loopstates
	-DNO_V_PROVISO	--> testing only -- disables a key proviso for partial order
			    reduction during multi-core verification runs (based on
			    cpu id numbers)
	-DBUDGET	--> testing only -- enables an experiment search reduction
			    strategy for bitstate search, restricting the maximum
			    size of a subtree searched within the dfs, in an attempt
			    to divide available memory more evenly across the entire
			    dfs search tree in exceptionally large state spaces
	-DTHROTTLE=N	--> testing only -- changes the throttle value for -DBUDGET
			    searches; the default value for N is 1.5

Return to Spin homepage Page last updated for Version 5.1.4 on 1 March 2008.