In software analysis most research has traditionally
been focused on the development of tools and techniques
that can be used to formally prove key correctness properties
of a software design. Design errors can be
hard to catch without the right tools, and even
after decades of development, the right tools can still
be frustratingly hard to use.
Most software developers know, though, that the number of coding
errors is usually much larger than the number of design errors.
If something is wrong, the problem can usually
be traced back to a coding mistake, and less frequently
to an algorithmic error. Tools for chasing down those types
of errors are indeed widely used. They are simple to use,
and considered effective, although they do not offer much
more than sophisticated types of trial and error.
The first commercial static source code analyzers
hit the market a little over a decade ago, aiming
to fill the gap between ease of use and more
meaningful types of code analysis.
Today these tools are mature, and their use is
widespread in commercial software development,
but they tend to be big and slow.
There does not seem to be any tool of this type yet
that can be used for interactive code analysis.
We'll describe the design and use of a different type of
static analysis tool, that is designed to be simple,
small, and fast enough that it can effectively be
used for interactive code analysis even for large code bases.
On August 5 2012, at 10:18pm PST, a large rover named Curiosity, made a soft landing on the surface of Mars. Given the one-way light-time to Mars, the controllers on Earth learned about the successful touchdown 14 minutes later, at 10:32pm PST. As can be expected, all functions on the rover, and on the spacecraft that brought it to its destination 350 Million miles from earth, are controlled by software. In this article we discuss some of the things we did to secure the reliability of this code.
How do you prove the correctness of multi-threaded code?
This question has been asked since at least the mid-sixties,
and it has inspired researchers ever since.
Many approaches have been tried, based on mathematical theories,
the use of annotations, or the construction of abstractions.
An ideal solution would be a tool that one can point at an
arbitrary piece of concurrent code, and that can resolve
correctness queries in real-time. We describe one possible
method for achieving this capability with a logic model checker.
The method is based on the combined use of 4 techniques, which
can be very powerful:
This paper describes a new extension of the Spin model checker that allows us to take advantage of the increasing number of cpu-cores available on standard desktop systems. The main target is to speed up the verification of safety properties, the mode used most frequently, but we also describe a small modification of the parallel search algorithm, called the piggyback algorithm, that is remarkably effective in catching violations for a class of liveness properties during the breadth-first search with virtually no overhead.
The range of verification problems that can be solved with logic model
checking tools has increased significantly in the last few decades.
This increase in capability is based on algorithmic advances, but in no
small measure it is also made possible by the rapid increases in processing
speed and main memory sizes on standard desktop systems. For the time being,
though, the increase in CPU speeds has mostly ended as chip-makers are
redirecting their efforts to the development of multi-core systems.
This new trend means that model checkers must adapt.
In the coming years we can expect to see systems with continued
increases in memory size, combined with an increasing numbers of CPU cores.
All CPU cores, though will, at least for the time being, be running at a
relatively low and fixed speed. We will discuss the implications of this
new trend, and describe strategies that can allow us to leverage it.
We describe an extension of the SPIN model checker for use on multi-core shared-memory systems and report on its performance. We show how, with proper load balancing, the time requirements of a verification run can in some cases be reduced close to N-fold when N processing cores are used. We also analyze the types of verification problems for which multi-core algorithms cannot provide relief. The extensions discussed here require only relatively small changes in the SPIN source code, and are compatible with most existing verification modes, such as partial order reduction, the verification of temporal logic formulae, bitstate hashing, and hash-compact compression.
The broad availability of multi-core chips on standard desktop PCs provides strong motivation for the development of new algorithms for logic model checkers that can take advantage of the additional processing power. With a steady increase in the number of available processing cores, we would like the performance of a model checker to increase as well -- ideally linearly. The new trend implies a change of focus away from cluster computers towards shared memory systems. In this paper we discuss the multi-core algorithms that are in development for the SPIN model checker. We discuss some unexpected features of this new algorithm.
We present the first experimental results on the implementation of multi-core model checking algorithms for the SPIN model checker. These algorithms specifically target shared-memory systems, and are initially restricted to dual-core systems. The extensions we have made require only small changes in the SPIN source code, and preserve virtually all existing verification modes and optimization techniques supported by SPIN, including the verification of both safety and liveness properties and the verification of SPIN models with embedded C code fragments.
This column describes a short set of rules that can increase our capability to verify critical software. They have been applied experimentally with good results in the development of mission critical code for spacecraft at JPL.
The ultimate objective of the grand challenge effort in software verification can not be restricted to the formal verification of single software components (programs). We propose to target a more formal treatment of the overall reliability of software systems, where we explicitly take into account that it will always be possible for individual software components to fail in unexpected circumstances, despite our best efforts to verify them. Despite such local failures, the overal system reliability should be unassailable. Despite component failure, it should be possible to design software systems that provably meet certain realiability standards.
Automated onboard planning systems are gaining acceptance for use on NASA missions. The planning system takes high level goals and expands them onboard into a detailed plan of action that the spacecraft executes. The system must be verified to ensure that the automatically generated plans achieve the goals as expected and do not generate actions that would harm the spacecraft or mission. These systems are typically tested using empirical methods. This paper describes a formal method for checking planners with the help of the SPIN model checker. The method allows us to check that plans generated by the planner meet critical correctness properties.
In the classic approach to logic model checking,
software verification requires a manually constructed
artifact (the model) to be written in the language
that is accepted by the model checker.
The construction of such a model typically requires good
knowledge of both the application being verified and
of the capabilities of the model checker that is used
for the verification.
Inadequate knowledge of the model checker can limit the
scope of verification that can be performed;
inadequate knowledge of the application can undermine
the validity of the verification experiment itself.
In this paper we explore a different approach to
software verification. With this approach, a
software application can be included, without
substantial change, into a verification test-harness
and then verified directly, while preserving
the ability to apply data abstraction techniques.
Only the test-harness is written in the language of
the model checker.
The test-harness is used to drive the application
through all its relevant states, while logical
properties on its execution are checked by the model
checker.
To allow the model checker to track state, and
avoid duplicate work,
the test-harness includes definitions of all
data objects in the application that contain
state information.
The main objective of this paper is to introduce a
powerful extension of the Spin model checker
that allows the user to directly define data abstractions
in the logic verification of application level programs.
With the steady increase in computational power of general purpose computers, our ability to analyze routine software artifacts is also steadily increasing. As a result, we are witnessing a shift in emphasis from the verification of abstract hand-built models of code, towards the direct verification of implementation level code. This change in emphasis poses a new set of challenges in software verification. We explore some of them in this paper.
Real-life bugs are successful because of their unfailing ability to adapt. In particular this applies to their ability to adapt to strategies that are meant to eradicate them as a species. Software bugs have some of these same traits. We discuss these traits, and consider what we can do about them.
Most software developers today rely on only a small number of techniques to check their code for defects: peer review, code walkthroughs, and testing. Despite a rich literature on these subjects, the results often leave much to be desired. The current software testing process consumes a significant fraction of the overall resources in industrial software development, yet it cannot promise zero-defect code. There is reason to hope that the process can be improved. A range of tools and techniques has become available in the last few years that can asses the quality of code with considerably more rigor than before, and often also with more ease. Many of the new tools can be understood as applications of automata theory, and can readily be combined with logic model checking techniques.
Only a small fraction of the output generated by typical
static analysis tools tends to reveal serious software defects.
Finding the important defects in a long list of
warnings can be a frustrating experience.
The output often reveals more about the limitations of
the analyzer than about the code being analyzed.
There are two main causes for this phenomenon. The first
is that the typical static analyzer casts its nets too
broadly, reporting everything reportable, rather
than what is likely to be a true bug. The second cause is
that most static analyzers can check the code for only a
predetermined, fixed, set of bugs: the user cannot
make the analyzer more precise by defining and experimenting
with a broader range of application-dependent properties.
We describe a source code analyzer called UNO that tries to
remedy these problems. The default properties searched
for by UNO are restricted to the three most common
causes of serious error in C programs:
use of uninitialized variables, nil-pointer dereferencing,
and out-of-bound array indexing.
The checking capabilities of UNO can be extended by the user
with the definition of application-dependent properties,
which can be written as simple ANSI-C functions.
Flight software is the central nervous system of modern spacecraft.
Verifying spacecraft flight software to assure that it operates
correctly and safely is presently an intensive and costly process.
A multitude of scenarios and tests must be devised, executed and
reviewed to provide reasonable confidence that the software will
perform as intended and not endanger the spacecraft. Undetected
software defects on spacecraft and launch vehicles have caused
embarrassing and costly failures in recent years.
Model checking is a technique for software verification that can
detect concurrency defects that are otherwise difficult to discover.
Within appropriate constraints, a model checker can perform an
exhaustive state-space search on a software design or implementation
and alert the implementing organization to potential design deficiencies.
Unfortunately, model checking of large software systems requires an
often-too-substantial effort in developing and maintaining the
software functional models.
A recent development in this area, however, promises to enable
software-implementing organizations to take advantage of the
usefulness of model checking without hand-built functional models.
This development is the appearance of "model extractors."
A model extractor permits the automated and repeated testing
of code as built rather than of separate design models.
This allows model checking to be used without the overhead and
perils involved in maintaining separate models.
We have attempted to apply model checking to legacy flight software
from NASA's Deep Space One (DS1) mission. This software was implemented
in 'C' and contained some known defects at launch that are detectable
with a model checker. We will describe the model checking process,
the tools used, and the methods and conditions necessary to
successfully perform model checking on the DS1 flight software.
A logic model checker can be an effective tool for debugging software
applications. A stumbling block can be that model checking tools expect
the user to supply a formal statement of the correctness requirements to
be checked in temporal logic. Expressing non-trivial requirements in
logic, however, can be challenging.
To address this problem, we developed a graphical tool, the TimeLine
Editor, that simplifies the formalization of certain kinds of
requirements. A series of events and required system responses are
placed on a timeline. The user converts the timeline specification
automatically into a test automaton, that can be used directly by a
logic model checker, or for traditional test-sequence generation.
We have used the TimeLine Editor to verify the call processing code for
Lucent's PathStar Access Server against the TelCordia LSSGR standards.
The TimeLine editor simplified the task of converting a large body of
English prose requirements into formal, yet readable, logic
requirements.
One of the corner stones of formal methods is the notion that a formal model enables analysis. By the construction of the model we trade detail for analytical power. The intent of the model is to preserve only selected characteristics of a real-world artifact, while surpressing others. Unfortunately, practicioners are less likely to use a tool if it cannot handle artifacts in their native format. The requirement to build a model to enable analysis is often seen as a requirement to design the system twice: once in a verification language and once in an implementation language. Since implementation cannot be skipped, formal verification is often sacrificed. In this talk I'll discuss methods that can be used to sidestep this problem. I'll show how we can automate the extraction of verification models from code, providing intuitive tools for defining abstractions rather than models.
How can we determine the added value of software verification techniques over the more readily available conventional testing techniques? Formal verification techniques introduce both added costs and potential benefits. Can we show objectively when the benefits outweigh the costs?
We review the automata-theoretic verification method,
as formulated by Moshe Vardi and Pierre Wolper, and
the use of propositional linear temporal logic, as first
proposed by Amir Pnueli.
The automata theoretic method for temporal verification
meshes well with verification methods based on graph
exploration algorithms. A number of practical issues
must be resolved, though, before this method can be applied
for the verification of distributed software applications.
We will consider those problems and describe the algorithmic
solutions that have been devised to address them.
Discussed will be: the possibilities for using both lossless
and lossy data storage techniques, Bloom filters,
partial order reduction, and automata based recognizers.
An important issue in the verification of software
applications is the establishment of a formal relation
between the abstract automata models that are verified and
the concrete software implementations that are modeled.
A method that has long been advocated is to use a design process
by which abstract models are refined step by step
into concrete implementations. If each refinement step can be shown
to preserve the correctness properties of the previous step,
the correctness of the final implementation can be secured.
This approach is easily justified and not hard to teach, but
has resisted adoption so far.
An alternative approach has more recently begun
to show promise. With this method we allow the programmer to produce
code as before, unimpeded by concerns other than the ontime delivery
of efficient and functional code. We now turn the verification
problem upside down and instead of deriving the implementation from
a high-level abstraction using refinement, we derive the high-level
model from the implementation, using abstraction.
We will review a model extraction technique for distributed software
applications written in the programming language C that works in this way.
The model extractor acts as a preprocessor for the model checker Spin.
If time permits, we will also discuss some recent results with
the application of model extraction in the verification of some
significant industrial software applications: the call processing
code for a new telephone switch and a checkpoint handling system.
A significant part of the call processing software for
Lucent's new PathStar(tm) access server was checked with
formal verification techniques. The verification
system we built for this purpose, named FeaVer is accessed via
a standard web browser. The system maintains
a database of feature requirements together with the
results of the most recently performed verifications.
Via the browser the user can invoke new verification runs,
which are performed in the background with the
help of a logic model checking tool.
Requirement violations are reported either as high-level
message sequence charts or as detailed source-level execution
traces of the system source.
A main strength of the system is its capability to detect
potential feature interaction problems at an early stage of
systems design, the type of problem that is difficult
to detect with traditional testing techniques.
Error reports are typically generated by the system
within minutes after a comprehensive check is initiated,
allowing near interactive probing of feature requirements
and quick confirmation (or rejection) of the validity of
tentative software fixes.
We describe a tool, called Ax, that can be used in combination with the model checker Spin to efficiently verify logical properties of distributed software systems implemented in ANSI-standard C [18]. Ax, short for Automaton eXtractor, can extract verification models from C code at a user defined level of abstraction. Target applications include telephone switching software, distributed operating systems code, protocol implementations, concurrency control methods, and client-server applications. This paper discusses how Ax is currently implemented, and how we plan to extend it. The tool was used in the formal verification of two substantial software applications: a commercial checkpoint management system and the call processing code for a new telephone switch.
To formally verify a large software application, the standard method is to invest a considerable amount of time and expertise into the manual construction of an abstract model, which is then analyzed for its properties by either a mechanized or by a human prover. There are two main problems with this approach. The first problem is that this verification method can be no more reliable than the humans that perform the manual steps. if rate of error for human work is a function of problem size, this holds not only for the construction of the original application but also for the construction of the model. This means that the verification process tends to become unreliable for larger applications. The second problem is one of timing and relevance. Software applications built by teams of programmers can change rapidly, often daily. Manually constructing a faithful abstraction of any one version of the application, though, can take weeks or months. The results of a verification, then, can quickly become irrelevant to an ongoing desing effort. In this paper we sketch a verification method that aims to avoid these problems. This method, based on automated model extraction from C code, was first applied in the verification of the call processing software for a new Lucent Technologies' system called PathStar(r).
Formal verification methods are used only
sparingly in software development.
The most successful methods to date are based
on the use of model checking tools. To use
such tools, the user defines a faithful
abstraction of the application (the model),
defines how the application interacts with its
environment, and formulates the properties that it
should satisfy. Each step in this process can
become an obstacle, and, generally, to complete
the entire verification process successfully
often requires specialized knowledge of
verification techniques and a considerable
investment of time.
In this paper we describe a verification method
that requires little or no specialized knowledge
in model construction. It allows us to extract a model
mechanically from the source of an application.
Interface definitions and property specifications have
meaningful defaults that can get the user started quickly
with a verification effort. The defaults can be adjusted
when desired.
All checks can be executed mechanically, even when the
application itself continues to evolve. Compared to
conventional software testing, the thoroughness of a
check of this type is unprecedented.
In a recent study at Stony Brook, a series of model checking tools,
among which Spin, were compared on performance.
The measurements used for this comparison focused on a model
of the i-protocol from GNU uucp version 1.04.
Eight versions of this i-protocol model were
obtained by varying window size, assumptions about
the transmission channel, and the presence or absence
of a patch for a known livelock error. The initially
published results appeared to show that the XMC system
could outperform the other model checking systems on
most of the tests.
The inital publication also contained a challenge to
the builders of the other model checkers to match the results.
This paper answers that challenge for the Spin model checker.
We show that with either default Spin verification
runs, or a reasonable choice of parameter settings,
the version of Spin that was used for the tests (Spin 2.9.7)
outperforms XMC in six out of eight tests. Inspired by the comparisons,
and the description of the reasons the builders of XMC
gave for its performance, we extended Spin with similar
optimizations, leading to a new Spin version 3.3.
We show that with these changes Spin outperforms XMC on
all tests.
It has become a good practice to expect authors of new algorithms for model checking applications to provide not only rigorous evidence of an algorithm's correctness, but also evidence of its practical significance. Though the rules for determining what is and what is not a good proof of correctness are clear, no comparable rules are usually enforced for determining the soundness of the data that the author provides to support a claim to practical value. We consider here how we can flag the more common types of omission.
A notable part of
the delays that can be encountered in
system design projects are caused
by logical inconsistencies that are often
inadvertently inserted in the early phases
of software design.
Many of these inconsistencies are ultimately
caught in design and code reviews, or in systems testing.
In the current design process, though, these errors
are rarely caught {\em before} the implementation of
the system nears completion.
We show that modeling and verifying the requirements separately,
before system design proper begins, can help to intercept the
ambiguities and inconsistencies that may otherwise not
be caught until testing or field use.
By doing so, one can prevent a class of errors
from entering the development phase, and thereby reduce
time to market, while at the same time
improving overall design quality.
We demonstrate this approach with an example
where we use the LTL model checking system {\sc Spin},
developed at Bell Labs, to mechanically verify the
logical consistency of a high level design for one of
Lucent's new products.
It is estimated that interval reductions of 10 to 20 percent
can be achieved by applying early fault detection
techniques of this type.
It is well-known that in general the problem of deciding whether a program halts (or can deadlock) is undecidable. Model checkers, therefore, cannot be applied to arbitrary programs, but work with well-defined abstractions of programs. The feasibility of a verification often depends on the type of abstraction that is made. Abstraction is indeed the most powerful tool that the user of a model checking tool can apply, yet it is often perceived as a temporary inconvenience.
Model checking is shown to be an effective tool in validating the behavior of a fault tolerant embedded spacecraft controller. The case study presented here shows that by judiciously abstracting away extraneous complexity, the state space of the model could be exhaustively searched allowing critical functional requirements to be validated down to the design level. Abstracting away detail not germane to the problem of interest leaves by definition a partial specification. The success of this procedure shows that it is feasible to effectively validate a partial specification with this technique. Three anomalies were found in the system. One was an error in the detailed requirements, and the other two were missing/ambiguous requirements. Because the method allows validation of partial specifications, it is also an effective approach for maintaining fidelity between a co-evolving specification and an implementation.
We consider the problem of storing a set of states S as a deterministic finite automaton (DFA). We show that inserting or deleting states can be done in expected time linear in the length of the state descriptor. The insertions and deletions preserve the minimality of the DFA. We discuss an application of this work to the reduction of the memory requirements of the model checker Spin.
Spin is an efficient verification system for models of distributed software systems. It has been used to detect design errors in applications ranging from high-level descriptions of distributed algorithms to detailed code for controlling telephone exchanges. This paper gives an overview of the design and structure of the verifier, reviews its theoretical foundation, and gives an overview of significant practical applications.
The verification algorithm of Spin is based on an
explicit enumeration of a subset of the reachable
state-space of a system that is obtained through
the formalization of a correctness requirement
as an omega-automaton.
This omega-automaton restricts the state-space
to precisely the subset that may contain the counter-examples
to the original correctness requirement, if they exist.
This method of verification conforms to the method
for automata-theoretic verification outlined in [VW86].
Spin derives much of its power from an efficient implementation
of the explicit state enumeration method in this context.
The maximum number of reachable states that can
be stored in main memory with this method, either explicitly
or implicitly, then determines the maximum problem size
that can be solved.
We review some methods that have been tried to
reduce the amount memory used per state,
and describe some new
methods that achieve relatively high compression rates.
Spin can be used to design robust software for
distributed systems in general, and bug-free
communications protocols in particular.
This paper outlines the use of the tool to address
protocol design problems.
As an example we consider the verification of a
published protocol for implementing synchronous
rendezvous operations in a distributed system.
We also briefly review some of the techniques that Spin
employs to address the computational complexity of larger
verification problems.
We show in this paper that the algorithm for solving the model checking problem with a nested depth-first search can interfere with algorithms that support partial order reduction. We introduce a revised version of the algorithm that guarantees compatibility. The change also improves the performance of the nested depth-first search algorithm when partial order reduction is not used.
Industrial software design projects often begin with a requirements capture and analysis phase. In this phase, the main architectural and behavioral requirements for a new system are collected, documented, and validated. So far, however, there are few tools that the requirements engineer can rely on to guide and support this work. We show that a significant portion of the design requirements can be captured in formalized message sequence charts, and we describe a tool set that can be used to create, organize and analyze such charts.
Formal verification methods become an
effective weapon in the arsenal of a designer
only after sufficient insight into a design
problem has been developed for a draft solution
to be formalized. In the initial phases of a
design the designers can therefore perceive
formal methods to be more of a hindrance
than an assistance.
Since formal methods are meant to be problem solving
tools, we would like to find ways to make them both
effective and attractive from the moment that
a design process begins.
The traditional software development cycle relies
mostly on informal methods to capture design errors
in its initial phases, and on more rigorous testing
methods during the later phases.
It is well understood, though, that those bugs that
slip through the early design phases tend to cause
the most damage to a design.
The anomaly of traditional design is therefore that
it excels at catching bugs at the worst possible point
in a design cycle: at the end.
In this paper we consider what it would take to
develop a suite of tools that has the opposite
characteristic: excelling at catching bugs at the
start, rather than the end of the design cycle.
Such early fault detection tools differ
from standard formal verification techniques in
the sense that they must be able to deal
with incomplete, informal design specifications,
with incomplete, informal design specifications,
with possibly ill-defined requirements.
They do not aim to replace either testing or formal
verification techniques, but to complement their strengths.
Message sequence charts (MSCs) are used often in the design phase
of a distributed system to record
intended system behaviors. They serve as
informal requirements documents that are
referred to throughout the design process, and that
may still be used as a reference in final system
integration and acceptance testing.
We show that message sequence charts are open to a
variety of semantic interpretations.
The meaning of an MSC, for instance, can depend on whether
one allows or denies the possibility of message loss or
message overtaking,
and it can depend on the particulars of
the message queuing policy that is to be adopted.
We describe an analysis tool that can perform automatic
checks on message sequence charts and can alert the user to
the existence of subtle design errors, for any predefined or
user-specified semantic interpretation of the chart.
The tool can also be used to specify time constraints
on message delays, and can then return useful additional
timing information, such as the minimum and the maximum
possible delays between pairs of events.
Spin is an on-the-fly model checking system
for finite state systems, that is optimized
for the verification of linear time temporal
logic (LTL) properties.
Spin's input language, Promela, can be used
to specify concurrent systems with dynamically
changing numbers of interacting processes, where
process interactions can be either synchronous
(rendez-vous) or asynchronous (buffered).
In the tutorial we examine some of the algorithms
that determine Spin's functionality and performance.
After a brief summary of the automata theoretic foundation
of Spin, we consider the methodology for
LTL model checking, the recognition of Buchi
acceptance conditions, cycle detection, and the
handling of very large verification problems.
The bitstate hashing, or supertrace, technique was
introduced in 1987 as a method to increase the quality
of verification by reachability analyses for applications
that defeat analysis by traditional means because of their size.
Since then, the technique has been included in many research
verification tools, and was even adopted in some tools that
are marketed commercially.
It is therefore important
that we understand well how and why the method works,
what its limitations are, and how it compares with alternative
schemes that attempt to solve the same problem.
The original motivation for the bitstate hashing technique
was based on empirical evidence of its effectiveness.
In this paper we provide the analytical argument.
We compare the technique with two alternatives that
have been proposed in the recent literature as potential
improvements.
Critical safety and liveness properties of a concurrent
system can often be proven with the help of a reachability
analysis of a finite state model.
This type of analysis is usually implemented as a depth-first search
of the product state-space of all components in the system,
with each (finite state) component modeling the
behavior of one asynchronously executing process.
Formal verification is achieved by coupling the depth-first
search with a method for identifying those states or sequences
of states that violate the correctness requirements.
It is well known, however, that an exhaustive depth-first search
of this type performs redundant work.
The redundancy is caused by the many possible
interleavings of independent actions in a concurrent system.
Few of these interleavings can alter the
truth or falsity of the correctness properties being studied.
The standard depth-first search algorithm can be modified
to track additional information about the interleavings
that have already been inspected, and use this information
to avoid the exploration of redundant interleavings.
Care must be taken to perform the reductions in such a way that
the capability to prove both safety and liveness properties is
fully preserved. Not all known methods have this property.
Another potential drawback of the existing methods is that
the additional computations required to enforce a reduction
during the search can introduce overhead that diminishes the benefits.
In this paper we discuss a new reduction method that solves
some of these problems.
The record of successful applications of formal verification
techniques is slowly growing. Our ultimate aim,
however, is not to perform small pilot projects that show
that verification is sometimes feasible in an industrial setting;
our aim must be to integrate verification techniques
into the software design cycle as a non-negotiable part of
quality control.
We take a snapshot of the state of the art
in formal verification, and, for inspiration,
compare it with other points in history where new
technology threatened to replace old technology,
only to discover how resilient an established
practice can be.
To keep a mild sense of perspective, we will also briefly
consider how badly mistaken some of the advocates of
new technology have sometimes been.
It is usually assumed that data-networks are a 20th Century phenomenon. Evidence of efforts to build data communications systems, however, can be found throughout history. Before the electrical telegraph was introduced, many countries in Europe already had fully operational nationwide data-networks in place. We briefly recount the long history of pre-electric communication methods and devices.
We discuss what the ideal characteristics of a formal design method should be, and evaluate how the existing methods measure up. We then look at a recent attempt within AT&T to apply formal methods based on design verification techniques, and evaluate it in the same context.
We present a new algorithm that can be used for solving the model-checking problem for linear-time temporal logic. This algorithm can be viewed as the combination of two existing algorithms plus a new state representation technique introduced in this paper. The new algorithm is simpler than the traditional algorithm of Tarjan to check for maximal strongly connected components in a directed graph which is the classical algorithm used for model-checking. It has the same time complexity as Tarjan's algorithm, but requires less memory. Our algorithm is also compatible with other important complexity management techniques, such as bit-state hashing and state space caching.
State space caching is a state space exploration method that stores all states
of just one execution sequence plus as many previously visited states as
available memory allows. This technique has been known for at least ten years,
but so far, has been of little practical significance.
When the technique is used in combination with a conventional
reachability analysis,
and memory usage is reduced beyond a factor of two or three, an
exponential increase of the run-time overhead sets in. The explosion of the
run time requirements is caused by redundant multiple explorations of unstored
parts of the state space. Almost all states in the state space of
concurrent systems are typically reached several times during a search.
There are two causes for this: firstly, several different partial orderings of
statement executions can lead to the same state; secondly, all interleavings of
a same partial ordering of statement executions always lead to the same state.
In this paper, we describe a method to remove the effects of the
second cause given above. With this method, most reachable states
are visited only once during the state space exploration. This, for the
first time, makes state space caching an efficient verification alternative.
The point where an exponential blow-up of run time occurs is moved
considerably, though it does not disappear.
In some cases, for instance, the memory requirements can be reduced to just 1%
of that of a full search, in return for a doubling or tripling of the run-time
requirements.
We study the effect of three new reduction strategies
for conventional reachability analysis, as used in
automated protocol validation algorithms.
The first two strategies are implementations of
partial order semantics rules that attempt to minimize
the number of execution sequences that need to be explored
for a full state space exploration.
The third strategy is the implementation of a state
compression scheme that attempts to minimize the amount
of memory that is used to built a state space.
The three strategies are shown to have a potential for
substantially improving the performance of a conventional search.
The paper discusses the optimal choices for reducing either
run time or memory requirements by four to six times.
The strategies can readily be combined with each other
and with alternative state space reduction techniques such
as supertrace or state space caching methods.
Formal design and validation methods have achieved most of
their successes on problems of a relatively modest size,
involving no more than one or two designers and no more
than a few hundred lines of code.
The serious application of formal methods to
larger software development projects remains a formidable challenge.
In this paper we report on some initial experience with the
application of a formal validation system to SDL design projects involving
more than ten people, producing tens of thousands of lines of high-level code
over several years.
The problems encountered here are large enough that most
formal methods break down, for both technical and
non-technical reasons.
We are only beginning to discover that the problem
of defining unambiguous and logically consistent
protocol standards is uniquely challenging and
should be considered fundamental.
The problem is to devise a method that would allow
us to draft protocols that provably, instead
of arguably, meet their design criteria.
So far, protocol design has mostly been considered a
mere programming problem, where skill in design
is only related to programming experience.
(And where experience is, of course, a nice word
for the history of mistakes that a programmer
is not likely to make more than once.)
Missing in that view is a set of objective construction
and measuring tools.
As in any engineering discipline, design skill should
be teachable as experience in the usage of design tools,
rather than mere experience in the avoidance of mistakes.
Before we can solve the protocol design problem in this way,
we will need two things:
It can be remarkably hard to design a good communications protocol,
much harder than it is to write a sequential program.
Unfortunately, when the design of a new protocol is complete,
we usually have little trouble convincing ourselves
that it is trivially correct.
It can be a unreasonably hard to prove those facts formally and
to convince also others.
Faced with that dilemma, a designer usually decides to trust
his or her instincts and forgo the formal proofs.
The subtle logical flaws in a design thus get a chance to hide,
and inevitably find the worst possible moment in the lifetime
of the protocol to reveal themselves.
Though few will admit it, most people
design protocols by trial and error.
There is a known set of trusted protocol standards,
whose descriptions are faithfully copied in most textbooks, but
there is little understanding of why some designs are correct
and why others are not.
To design and to analyze protocols you need tools.
Until recently the right tools were simply not generally available.
But that has changed.
In this tutorial we introduce a state-of-the-art tool
called SPIN and the specification language PROMELA.
We show how the language and the tool can be used to
design reliable protocols.
The tool itself is available by anonymous ftp from netlib.bell-labs.com, or
by email from the author.
A traditional protocol implementation minimally consists of two distinct parts, a sender and a receiver. Each of these two parts normally runs on a distinct machine, where the implementation is provided by the local expert. At best, the two machines are of the same type and the protocol implementations are provided by the same person. More likely, however, the machines are of different types and the implementations of the two halves of the protocol are provided by different people, working from an often loosely defined protocol specification. It seems almost unavoidable that the two implementations are not quite compatible.
In this paper we consider an alternative technique. With this method, one of the two implementors can design, validate, and implement all the relevant protocol parts, including those parts that are to be executed remotely. Each communication channel is now terminated on the receiving side, by a single standard protocol interface, which can be called a Universal Asynchronous Protocol Interface or UAPI.
A UAPI can be configured to accept and execute any type of protocol.
The initial setup-handshake of the protocols that are used on
these lines simply includes the loading of the relevant controllers.
Once this initialization is complete, the UAPI will be
indistinguishable from a hard-coded implementation of
the protocol that was selected.
There is no restriction on the type of protocol
a UAPI can execute.
It is even possible to change part or all of the protocol logic
dynamically in the middle of file transfers, to exploit changing
requirements or constraints.
Though it is likely that the UAPI is most efficiently
implemented in hardware, it can also run as a software module,
and its usefulness can be verified in this way.
This paper introduces the concept of a UAPI and explains how
a software controller can be constructed.
The problem of enabling a `sleeping' process on a shared-memory multiprocessor is a difficult one, especially if the process is to be awakened by an interrupt-timeevent. We present here the code for sleep and wakeup primitives that we use in our multiprocessor system. The code has been exercised by months of active use and by a verification system.
This paper describes a method for validating specifications written in the CCITT language SDL. The method has been implemented as part of an experimental validation system. With the experimental system we have been able to perform exhaustive analyses of systems with over 250 million reachable composite system states. The practicality of the tool for the analysis of substantial portions of AT&T's 5ESS(rg) Switch code is now being studied.
This paper studies the four basic types of algorithm that,
over the last ten years, have been developed
for the automated validation of the logical consistency of
data communication protocols.
The algorithms are compared on memory usage,
CPU time requirements, and the quality, or
coverage, of the search for errors.
It is shown that the best algorithm, according to
above criteria, can be improved further in a significant way,
by avoiding a known performance bottleneck.
The algorithm derived in this manner works
in a fixed size memory arena (it will never run out
of memory), it is up to two orders of magnitude faster
than the previous methods, and it has superior coverage
of the state space when analyzing large protocol systems.
The algorithm is the first for which the search
efficiency (the number of states analyzed per second)
does not depend of the size of the state space: there
is no time penalty for analyzing very large state spaces.
An automated analysis of all reachable states in a distributed
system can be used to trace obscure logical errors that would be
very hard to find manually. This type of validation
is traditionally performed by the symbolic execution of a
finite state machine (FSM) model of the system studied.
The application of this method to systems of a practical size,
though, is complicated by time and space requirements. If a system is larger,
more space is needed to store the state descriptions and more
time is needed to compare and analyze these states.
This paper shows that if the FSM model is abandoned and replaced
by a state vector model significant
gains in performance are feasible, for the first time making it
possible to perform effective validations of large systems.
Pico is an interactive editor for digitized images. Editing operations are defined in a simple expression language based on C. The editor checks the 'transformation' expressions for syntax, translates, optimizes and then executes them as programs, using a builtin on-the-fly compiler.
The editor's command structure resembles that of conventional multi-file text editors with simple options for opening, reading, changing, displaying and rewriting digitized images.
It is not likely that many traveling salesmen can
be discouraged from their job by a lecture on its
complexity. Not surprisingly,
writers of automated protocol analyzers
are much the same. The problem
of determining whether an arbitrary message passing
system contains deadlocks is PSPACE-complete at best
(for bounded queue lengths) [7-9].
Yet for any given formal analysis model it is
easy to derive state space exploration routines
that can find such errors with certainty - given a
sufficient amount of time and space. In practice, therefore,
one of the main problems is to optimize the speed and memory usage
of an automated validator.
To phrase it differently: it is not hard to validate protocols,
it is hard to do it (sufficiently) fast.
In reachability analyses, the limits of what can be analyzed in practice
can be moved substantially if the traditional finite state machine model
is abandoned. To illustrate this, we introduce a simple symbolic execution
method based on vector addition. It is extended into a full protocol validator,
carefully avoiding known performance bottlenecks.
Compared with previous methods the performance of this validator is
roughly two orders of magnitude in speed faster and allows
validation of protocol systems up to 10**6 states in
only minutes of CPU time on a medium size computer.
Argos is a validation language for data communication protocols. To validate a protocol a model in Argos is constructed consisting of a control flow specification and a formal description of the correctness requirements. This model can be compiled into a minimized lower level description that is based on a formal model of communicating finite state machines. An automated protocol validator trace uses these minimized descriptions to perform a partial symbolic execution of the protocol to establish its correctness for the given requirements.
Automated protocol validation tools are by necessity often
based on some form of symbolic execution. The complexity of the analysis
problem however imposes restrictions on the scope of these tools.
The paper studies the nature of these restrictions and explicitly
addresses the problem of finding errors in data communication
protocols of which the size precludes analysis by traditional means.
The protocol tracing method described here allows one to locate design
errors in protocols relatively quickly by probing a partial
state space. This scatter searching method was implemented in
a portable program called trace.
Specifications for the tracer are written in a higher level language and are
compiled into a minimized finite state machine model
which is then used to perform either partial or
exhaustive symbolic executions. The user of the tracer can control
the scope of each search. The tracer can be used as a fast debugging
tool but also, depending on the complexity of the protocol being analyzed,
as a slower and rather naive correctness prover.
The specifications define the control flow of the protocol
and may formalize correctness criteria in assertion primitives.
A traditional method to validate protocols by state space exploration is to use forward symbolic execution. One of the main problems of this approach is that to find all undesirable system states one has to generate all reachable states and evaluate all desirable system states as well. The paper discusses an alternative search strategy based on backward symbolic execution. This time we start with a state that we know to be undesirable and execute the protocol backwards, evaluating only undesirable states in an effort to show that they are unreachable.
In a joint project with the Dutch PTT, the Delft University of
Technology has undertaken the development and construction of an
interactive protocol design and analysis system.
This system, named Pandora, provides users with
a controlled environment for protocol development.
It provides (1) protocol synthesis guidance,
(2) tools for the formal analysis of protocols,
and (3) both software and hardware tools for protocol assessment.
The system can assist the user in the documentation
of protocol designs by autonomously extracting and plotting
SDL-diagrams from protocol specifications.
It also has a set of tools for the generation of executable
protocol implementations from abstract specifications.
For assessments a special hardware device, a network simulator,
is being built.
This device can be programmed to imitate the behavior of a wide
range of data networks. Unlike the networks modeled, though,
the network simulator's behavior is completely reproducible.
The simulator is protocol independent; that is, it can be programmed
for any protocol format. It can simulate
deletion, distortion and insertion errors
in a message exchange, and it can perform message mappings.
The error distribution functions are programmable.
Work on the Pandora system was begun in March 1982. The paper will
describe features of the system, and will present examples of its use.
De computer communicatie heeft zich in relatief korte tijd tot een buitengewoon belangrijk nieuw vakgebied binnen de informatica ontwikkeld. Op informele wijze wordt in dit artikel getracht de lezer inzicht te bieden in de boeiende, soms amusante, problemen die ontstaan als men een werkelijk eenduidige informatie uitwisseling tussen ver van elkaar verwijderde computersystemen mogelijk wil maken. Een van de belangrijkste doeleinden van dit artikel is tevens om te wijzen op het gevaar van voortijdige standaardisatie op communicatie protocollen en ontwerpmethoden. Het belang, maar ook de complexiteit, van de ontwikkeling van formele protocol analyse methoden wordt toegelicht.
Het artikel besluit met een voorstel voor de opzet van meer serieus onderzoek naar communicatie protocollen.
This paper introduces a simple algebra for the validation of
communication protocols in message passing systems.
The behavior of each process participating in a communication
is first modeled in a finite state machine.
The symbol sequences that can be
accepted by these machines are then expressed in
`protocol expressions,' which are defined as regular
expressions extended with two new operators: division and multiplication.
The interactions of the
machines can be analyzed by combining protocol expressions
via multiplication and algebraically manipulating the terms.
The method allows for an arbitrary number of processes
to participate in an interaction. In many cases an
analysis can be performed manually, in other cases the analysis
can be automated. The method has been applied to a number
of realistic protocols with up to seven interacting processes.
An automated analyzer was written in the language C. The
execution time of the automated analysis is in most cases
limited to a few minutes of CPU time on a PDP 11/70
computer.
A model of a multiprocessing system is introduced that allows us
to design, analyze, and implement coordination schemes in a stepwise
manner. A distinction is made between two principal design phases:
(1) the design of a consistent set of coordination rules, and (2)
the design of a minimal and complete signaling scheme.
The correctness of a design can be established independently for
both phases.
The working of the model is based on the existence of a hypothetical
machine called a guard. The restrictions implied by the
idealized properties of this guard machine are removed in later
design phases. Meanwhile, the same restrictions allow for
straightforward correctness analysis, of which the validity
can be shown to be preserved by the later refinements.
The model allows one to derive schemes which are largely
implementation independent. It is illustrated how specific
implementations in Concurrent Pascal can be generated
mechanically by a translator program.
In an appendix, a new set of D-semaphore operations is
introduced that allows for a convenient short-hand description
of coordination schemes.
Hype is an X-based previewer for troff documents. It offers support for hyper-text commands that can either be embedded into the troff sources or issued by independent Unix commands, while the previewer is running.
The embedded commands leave tags in the troff output that are interpreted by hype, but are invisible to other previewers and to devices such as line-printers and typesetters.
Issuing the commands at run-time make it possible to add hyper-text support to any troff document, without modification of the sources. The information required to built hyper-text support of this type can be extracted from the raw troff-ouput files with a few simple post-processing tools that will be discussed.
Protocol validation by exhaustive reachability analysis is a much tried procedure that has a well known set of flaws and strengths. The major strength is that the entire process can be automated for many types of protocol models. The major flaws are the time and space requirements. Efforts to extend the analytical power of the automated tools are usually of only casual interest since they only help to boost the complexity of the algorithms and render them almost useless in practice.
Although solving the performance problem is hard and
requires sophisticated code, programming the basic symbolic
execution routine itself is almost trivial.
As an example, this paper discusses a tiny, 46-line,
AWK program that performs the task.
The micro tracer is powerful enough to
analyze the standard test-cases that are
normally published, including but not restricted to,
the alternating bit protocol and CCITT recommendation X.21.
The code is available online at: microtrace.
(Manual for the first general on-the-fly protocol verification system based on depth-first search. The verifier became operational, and discovered its first bugs in a protocol, on 21 November 1980. The verifier SPIN has its earliest roots in this system.)
Pan is a program that can analyze the consistency of protocol specification for up to ten interacting processes. The validation method is based on a special algebra for an extended type of regular expressions, named protocol expressions. A protocol specification is written in a CSP-like language that includes concatenation, selection, and looping (but no variables). Pan analyzes an average protocol faster than troff prepares an average paper. Unlike troff though, the execution time of pan is more sensitive to the quality than to the size of its victim.