![]() |
|
Home
> Publications >
Safety Risk of Requirements Incompleteness
|
The Safety Risk of Requirements Incompleteness |
by Jeffrey Howard and Patrick Anderson. Presented at ISSC 2002, Denver,
Aug. 2002.
For Power Point presentation, click here. Abstract
Many
incidents and accidents in real-time software systems have been traced
back to incompleteness in the requirements.
This paper introduces a set of criteria that can be used to
uncover potentially hazardous incompleteness in software requirements.
SpecTRM-RL is introduced as a readable and reviewable language
for software requirements specification that facilitates the checking of
these criteria. Introduction
Many
incidents and accidents have been linked to flaws in real-time embedded
system software. Software-related
errors can be broadly classified into two categories: (1) failure of the
implementation to meet its requirements and (2) failure of the
requirements to include necessary properties.
While the first category, implementation errors, should not be
dismissed idly, it is the second category that historically has been of
greater risk. If the
requirements do not correctly specify the behavior needed to make the
system safe, even “correct” software may still contribute to system
hazards. A
software requirements specification describes the black-box behavior of
the software. Accidents
often result from unhandled or unspecified cases.
Perhaps the most important property of a requirements
specification is completeness. A
requirements specification must provide all the information necessary to
distinguish between desired and undesired behavior.
In this context, the word “complete,” is not used for its
mathematical definition. Completeness
here means that the requirements for safe software behavior are clear
enough that any likely interpretation will lead to implementation of
safe software behavior. Some
risks associated with real-time software requirements can be reduced by
methods that address requirements incompleteness.
Application specific checks of incompleteness can be developed
from the system safety analysis. General
criteria are available as well. Leveson
(ref. 1) has compiled a large number of criteria that can be applied to
detect incompleteness in requirements. Particular attention is given to safety-critical real-time
systems. Furthermore,
Leveson et al. (ref. 2) have devised a software requirements
specification language called SpecTRM-RL that facilitates checking and
enforcing completeness criteria. These
criteria are general enough to be used on any real-time system.
System-specific criteria, such as those generated in hazard
analyses, are not included. The criteria have come from basic engineering principles and
from extensive examination of incident and accident reports.
They span all aspects of real-time software behavior, from timing
to user-interface. The
criteria have been validated on spacecraft projects at JPL (ref. 3) and
used on industrial projects at Safeware Engineering Corporation and at
other sites. The criteria
have been integrated into a requirements specification language and
tools can be used to help the modeler and reviewer ensure that the
specifications are acceptably complete for safety-critical systems.
This paper describes how SpecTRM-RL (SpecTRM Requirements
Language) models assist review of requirements for completeness. Real-time System Specification
and SpecTRM-RL
Problems
are made easier or harder to solve by their representation.
A good format for writing specifications can make checking
completeness criteria easier. Real-time
control systems command actuators to affect the behavior of a controlled
process (figure 1). The
controller selects commands based on inputs from sensors, predications
of future values, and memory of past commands.
Control laws are usually formulated on actual values from the
controlled process. The
software controller must use non-ideal measured values.
The controller builds a model of the state of the controlled
process. Accidents and
hazards happen when the controller’s model is not synchronized with
the actual process state. For
example, if a software controller for an aircraft does not know that the
aircraft is on the ground, it may raise the landing gear when commanded
to do so.
Figure 1 - Real-Time System Block Diagram The
hazard in the landing gear example indicates that the controller’s
model of the controlled process is incomplete.
This incompleteness could be fixed by having the aircraft check
for weight on its wheels before raising the landing gear.
Even with this improvement, the same accident could result if the
controller has an incorrect model of the controlled process, believing
that the aircraft is not on the ground when it is.
Many accidents have been caused by systems that suspend and
resume their processing without taking into account that their
controlled process may have changed state. To
apply the completeness criteria, the control loop representation is
translated to a SpecTRM-RL requirements specification.
SpecTRM-RL models software as a state machine. The controller’s outputs to actuators are triggered by
state changes. State
changes are triggered by changing input values and the passage of time.
Building a model of a system is time-consuming and difficult.
Most projects do not have the resources to develop both a
software specification and a separate model for checking completeness
criteria. Furthermore,
separating the requirements from the model makes it difficult to keep up
with requirements changes. SpecTRM-RL
solves this problem by serving as both the software specification and
the model. Although
SpecTRM-RL allows many criteria to be checked with automated tools, we
believe that human review by domain experts will always be a critical
technique in identifying requirements errors.
SpecTRM-RL therefore has a very readable syntax to encourage
human review. The
control loop representation of the system is reflected in the graphical
portion of the SpecTRM-RL model. The
example shown in figure 2 is the movement and positioning software
(MAPS) for an industrial robot that services tiles on the space shuttle.
This graphical view serves much the same purpose as the control
loop, providing a quick summary of information about the model.
The central box represents the software being modeled: MAPS in
this case. To the left of
the box are entities that provide control inputs to the software.
MAPS is controlled by an operator through a control panel with a
joystick and a planner module that automatically plans a path for the
robot. Above the box are
sensory inputs; MAPS receives input from a scanner used to determine the
position of the robot. To
the right of the central box and below are outputs.
Outputs shown to the right of the software block are commands
given to the industrial robot, such as commands to deploy or retract
stabilizer arms. Other
outputs, such as outputs to operator displays or logs are shown below
the center box.
Figure 2 - SpecTRM-RL Graphical View The central box offers an
overview of the MAPS software itself.
The model is of the requirements.
No extraneous design information is included.
The software is viewed as a black box; only information necessary
to describe the relationship between the inputs and the outputs is
depicted in the model. A
SpecTRM-RL requirements model can be thought of a transfer function for
the software. The top left describes the control modes of the system.
Modes divide software behavior into several broad categories.
For example, the behavior of MAPS while enabled is very different
from when the software is disabled.
The other entries in the box are inferred system state.
Inferred system state is the state of the controlled plant as
inferred by the controller software.
In this case, the state is what MAPS infers to be the state of
the industrial robot. If,
by some incompleteness in the requirements, MAPS’ inferred state is
incorrect or insufficient, such as the wheel motors being on when MAPS
infers that they are off, and accident may result. Specification Completeness
Criteria
As
the software specification is developed in SpecTRM-RL, completeness
criteria can be checked and enforced.
In all, about 60 criteria have been developed.
The criteria cover
For
space constraints, a subset is demonstrated here. Most of the criteria are integrated into the SpecTRM-RL
syntax while the majority of the rest can be checked using automated
tools. This is shown by
describing the language components and how each addresses the
completeness criteria. Outputs:
SpecTRM-RL modeling is often begun from the outputs.
Beginning with outputs makes it possible to work backwards from
the commands necessary to control the plant, determining what inferred
system state must be tracked and thus what inputs are necessary. An output specification (figure 3) begins with a number of
attribute-value pairs providing information about the output.
The triggering condition is a table.
Each expression in the first column is matched to values of true
“T”, false “F”, or don’t care “*”.
If every row matches, the table is true.
The rows represent an AND condition.
If there is more than one column of T/F/* entries, the columns
represent an OR. If all the rows of expressions in the first column match the
T/F/* entries in a subsequent column, the table as a whole is true.
When a triggering condition AND/OR table is true, the output is
sent. The output message is
given values according to the message contents section.
The output specification format lends itself to checking a number
of Leveson’s completeness criteria (ref. 1). Several
attributes near the top of the output specification enforce output value
criteria. Output values
must be checked to ensure they are not hazardous.
First, acceptable values for the output are recorded.
For numeric values, units must be specified to avoid problems at
component interfaces. The
granularity of changes in value must be recorded as well.
Granularity can help to avoid problems at component interfaces.
If control over a valve or other mechanical actuator is not
fine-grained enough, an accident or poor system performance may result. Timing
behavior is also addressed by the output specification.
First, initiation delay and completion deadlines are addressed.
The initiation delay is the length of time the software must
wait, after the triggering conditions for the output become true, before
sending the output. If this
information is not recorded, the software may send mistimed commands to
the plant. Completion deadlines are also important; the control software
cannot rely on an output command having been finished until the
completion deadline is passed.
Figure
3 – SpecTRM-RL Output Specification The
next set of attributes relates to the capacity of the environment around
the software for handling output messages.
Outputs must not be sent frequently enough to overwhelm the load
tolerance of the environment. SpecTRM-RL
outputs describe their load, minimum time between outputs, and maximum
time between outputs. Output
load problems have contributed to accidents.
At one point during the Three Mile Island accident, a printer
that reported problems fell two hours behind the state of the system
(ref. 4). In the case of
the command shown in figure 3, the load entry reads that the software is
required not to send output commands while the robot is considered to be
in motion — the physical characteristics of motion should prevent the
output command from exceeding the load acceptable to the motion
controller. To a reviewer
of the specification, this may immediately raise questions as to whether
the software could enter a state where the controller inferred that the
robot was still while it was actually moving.
Another question would be whether the motor controller’s
message load might be exceeded in the period of time between the
issuance of a movement command and the receipt of feedback indicating
that the robot began movement. These
are exactly the kinds of questions that the completeness criteria are
designed to provoke, and SpecTRM-RL facilitates uncovering both these
questions and their answers. Feedback
loops must be included in the software specification. Accidents occur when the system state inferred by the
software controller does not match the actual state of the system.
If no feedback is provided, the controller software cannot
determine whether the system has changed based on the commands it has
sent. This can lead to
dangerous system behavior. For
example, feedback from the “Move Relative” command is provided by
position readings after each move.
Because physical motion is imprecise, if position feedback were
not provided, a steadily degrading consistency between the
controller’s coordinates for the robot and the robot’s actual
position might result in the robot being commanded into a solid object.
Feedback information also addresses latency.
If feedback inputs arrive before the latency period for a command
has elapsed, it is possible that the input is spurious. Another
completeness criterion suggests that all outputs must be reversible.
With rare exception, any command that can be given to a system by
a software controller should be reversible.
The output specification format of SpecTRM-RL includes an
attribute for describing what output reverses the action taken.
The description of “Move Relative” (figure 3) shows that the
“Stop” command can be used to stop motion.
Using the triggering condition tables and working backward into
the model, it is also possible to use these outputs to verify another of
the criteria: if one output is to reverse another, there must be a path
through the state space between them. The
triggering condition AND/OR table is a useful tool itself for applying
completeness criteria. The
criteria assert that reachable hazardous states should be eliminated
whenever possible. When
they cannot be eliminated, their frequency and duration should be
reduced as much as possible. The
triggering condition is a guard condition on the production of the
output. Because of SpecTRM-RL’s readable syntax, these guard
conditions can be reviewed by domain experts to ensure that an output
cannot be produced under circumstances that make the output behavior
hazardous. Modes:
Modes represent a collection of system behaviors.
For example, if the industrial robot is in operator control mode,
it will behave differently than when it is in computer control mode.
Mode behavior, particularly regarding mode transitions and mode
confusion, has contributed to a number of accidents.
SpecTRM-RL specifies modes as shown in figure 4.
Figure
4 – SpecTRM-RL Mode Specification There
are fewer attributes at the beginning of mode specifications than there
are for outputs. The
references attribute is filled in with a list of all the elements of the
model that are used in the definition of the control mode.
While manual maintenance of these lists might grow tedious, they
can be automatically generated by tools.
The model element names listed in the references section can be
turned into hyperlinks by most editors, allowing easy navigation around
the model. By providing an easy navigation path through related model
elements, the references list can aid in checking any of the
completeness criteria that involve multiple model elements.
For example, one criterion states that all information from
sensors should be used in the SpecTRM-RL model.
An unused input suggests some incompleteness in the requirements
for the software. Automated
tools can verify that every input appears in at least one model
element’s references attribute. Modes
also have a list of the elements in which they appear.
This attribute can be used to navigate through the SpecTRM-RL
model looking for system behaviors that are affected by the mode.
One of the completeness criteria is that states and modes should
not inhibit the production of later required outputs.
For example, if the robot is in computer-controlled mode, it is
conceivable that the operator may wish to have the robot take a
different path. Examining
the list of model elements in the appears in attribute, it becomes
apparent that the robot cannot follow operator movement commands unless
the movement control mode can be changed to operator control.
Examining that transition, it becomes evident that any time the
deadman switch is released, the system returns to operator control.
Thus, in this example and for this case, the criterion is
satisfied. Lastly,
the mode specification includes a definition of possible mode values and
the conditions under which the mode changes.
Figure 4 shows four example modes: startup, operator, computer,
and halted. Each of these
modes is associated with an AND/OR table.
These AND/OR tables follow the same rules as those for output
triggering conditions. If
every row in a T/F/* column matches against the expressions in the first
column, that column is true. If
any column in a table is true, the entire table is true.
If an AND/OR table is true, the mode is changed to the mode
attached to that table. These
AND/OR transition tables are suitable for checking a variety of
completeness criteria. One
criterion checks that the software must always start up in a safe
configuration. The use of a
startup mode to handle the initialization of the software provides an
opportunity to ensure that the software’s model of the controlled
plant can be correctly initialized.
Another criterion is that paths to modes that increase the hazard
of the system should be easy to abort, whereas paths that reduce the
hazard of the system should be plentiful and easy to follow.
In figure 4, many conditions are sufficient to bring the
industrial robot under operator control, but there is only one way to
put the robot under computer control. States:
State elements in SpecTRM-RL represent the model that the
software controller has of the system being controlled.
Inputs to the software change the inferred system state, which
may in turn change the mode of the system or trigger outputs.
An example SpecTRM-RL state specification from the industrial
robot example is included in figure 5.
The references and appears in attributes in the state definition
are analogous to those discussed above for modes.
The same associated criteria can be checked for states.
Figure 5 – SpecTRM-RL State SpecificationSpecTRM-RL
state specifications model the state of the process. If no new information is received from the modeled process
for some time, it is unsafe to rely on the outdated information.
For example, when the robot switches from computer control to
operator control (figure 4), commands sent to the motor controller are
velocity based rather than position based.
In order to operate correctly and safely, the robot’s motor
controller must be set to relative position mode or velocity mode
accordingly. If MAPS has
not received an input from the motor controller in some time, it cannot
be sure what mode the motor controller is really in — particularly if
some suspension of input processing may have taken place.
Unknown gives the engineer writing the requirements a way to
reason about situations in which the controller does not have
information. The
AND/OR transition tables are, in other respects, similar to the mode
transition tables discussed above.
A state element transitions to a new state value when the AND/OR
table associated with that value is true.
The tables themselves are easy to read and review.
Two completeness criteria, the robustness and nondeterminism
criteria, are easy to check with automated tools once the specification
is written in SpecTRM-RL. To
meet the robustness criteria, there must be no conditions under which
all the AND/OR tables are false. For
example, in figure 5, suppose that Power Up is false and the “Motor
Controller Setting” input is not obsolete, uninitialized, relative, or
velocity. Under those conditions, no table is true, and the state
machine cannot make a transition of any kind.
In reality, the only values possible for the input are the four
tested in the AND/OR tables, so the state description is robust.
Software-related accidents are often caused by unhandled cases in
requirements specifications. Robustness ensures that all cases are handled by the state
element. The
nondeterminism criterion states that all SpecTRM-RL state elements must
be deterministic in their behavior.
To be deterministic, there must not be any circumstances under
which more than one AND/OR table can be true at the same time.
The state element described in figure 5 is not deterministic.
Suppose that power up is true, meaning that the software has just been
started, and a motor controller setting input comes in with a value of
relative at the same time. Both
the unknown AND/OR table and the relative AND/OR table are true.
In practice, the behavior of the software will be implementation
dependent. It is very
difficult to analyze nondeterministic systems for properties such as
safety. Therefore, it is
strongly desirable to remove nondeterminism from a system.
Automated tools can help enforce both robustness and determinism
in a SpecTRM-RL specification. Inputs:
Inputs are the last major grouping of specification elements in
SpecTRM-RL. An example
input from the industrial robot example is shown in figure 6.
The general form of the input is very similar to the other
SpecTRM-RL model elements. First,
a series of attributes describes the input.
Second, a series of tables defines the behavior of the input.
Figure 6 – SpecTRM-RL Input SpecificationMany
of the completeness criteria deal with inputs.
First, inputs have the same interface attributes that adorn
outputs. Possible values for
the input, their units, and their granularity are all written in the early
part of the input specification. In
the example of the motor controller setting (figure 6), the units and
granularity are irrelevant, as the possible values are enumerated values.
In addition to their possible values, every input can also take on
the special value of obsolete. Obsolete is used to enforce the completeness criteria related
to data age. Timing
behavior follows the possible value block.
The load attribute describes the maximum load that software is
expected to handle. If inputs
message arrive faster than this, some fallback mechanism is necessary.
The minimum and maximum time between inputs is recorded as well.
Several criteria relate to the timing and capacity of inputs.
A minimum and maximum load description is required for every input
that is not tied to another event. Another
criterion states that a minimum arrival rate is necessary for each
communication path in order to distinguish between normal silence and a
component failure of the communication path.
Normal silence must be differentiated from erroneous silence.
The
completeness criteria also require that some response to a violation of
capacity assumptions must be included in the software specification.
These responses are described in the exception handling attributes.
Many approaches are available, from dropping extraneous inputs to
shedding other functionality to free up resources; the context of the
system will guide engineers in making trade-offs.
SpecTRM-RL and the completeness criteria serve to highlight the
need for these decisions and provide assistance in making good choices.
Additional criteria cover specific possible responses.
For example, if function shedding or some other dynamic
reconfiguration is used to cope with excessive input load, some hysteresis
should be included in the conditions that return the system to normal
operations. This extra margin
prevents the software form jittering or thrashing between modes. The
next block of attributes covers the relationship between inputs and
outputs. One of the
completeness criteria states that input capacity must be exceeded by the
output capacity of any output that the input may trigger.
Less formally, if inputs enter the software more quickly than the
outputs they trigger are permitted to leave, a potentially dangerous
backlog will accrue. Tracking
which outputs an input may influence makes checking this criterion easier.
The latency attribute covers feedback from the input perspective.
Any output for which this input may provide feedback information is
listed in the related outputs field.
If an input arrives within the latency period of an output, some
contingency action may be necessary.
Another criterion states that if an input that should only be
received in response to an output, spontaneous receipt of that input must
be treated as an abnormal condition.
The attributes block makes it easy to check this criterion by
linking inputs to relevant outputs as well as the states and modes that
depend on the input. The
next attributes in the input specification relate to obsolescence.
These attributes are very important, as no data is good forever.
Accidents can be avoided if software does not act upon out of date
information as though it were current and correct.
Additionally, obsolescence of inputs can help with a criterion
applicable to startup: the maximum amount of time that the software waits
for initial inputs must be specified.
Because inputs start out obsolete until they are first received, if
an input remains obsolete for too long, it can be concluded that the
communications medium failed before the software was started.
The AND/OR tables also support the obsolescence criteria,
describing the circumstances under which the input takes on the obsolete
value. The
appears in attribute, which is present in every model element, is
particularly important for inputs. Many
of the completeness criteria for inputs depend on the software responding
to inputs outside their expected range, such as inputs that are too early,
too late, missing altogether, or nonsensical values.
In order to accommodate these conditions and meet the completeness
criterion, state, mode, and output specifications will be involved as well
as the inputs themselves. The
appears in attribute provides a navigational aid to verify that these
criteria have been met. Summary
Most
accidents related to software stem from requirements problems,
particularly unhandled cases, and the completeness criteria described here
were developed using both theory and experience from accidents related to
software over the past 20 years. The
criteria have been successfully used to find many critical requirements
errors on industry projects and are specifically designed for
safety-critical systems. Some
users have reported that although they have found many important
requirements errors using the criteria, they are hard to use as a
checklist. In response to
this problem, the criteria have been integrated into a state-of-the-art
requirements specification language (SpecTRM-RL) and tools for automated
enforcement of the criteria have been designed.
Safeware Engineering Corporation is working on a suite of
requirements development and management tools to support the use of
SpecTRM-RL and the checking of the completeness criteria. References
1.
Leveson, Nancy G., Safeware: System Safety and Computers,
Addison-Wesley Publishing Company, Reading Massachusetts, 1995. 2.
Leveson, Nancy G., Mats P. E. Heimdahl, and Jon Damon Reese,
“Designing Specifications for Process Control Systems: Lessons Learned
and Steps to the Future”, Proceedings of the 7th European Engineering
Conference held jointly with the 7th ACM SIGSOFT Symposium on the
Foundations of Software Engineering, Springer-Verlag, London, UK 1999. 3.
Lutz, Robyn R., “Targeting safety-related errors during the
software requirements analysis”, Proceedings SIGSOFT ’93:
Foundations of Software Engineering, ACM Press, New York, 1993. 4.
Kemeny, John G., “Saving American Democracy: The lessons of Three
Mile Island,” Technology Review, June 1980. Author Information
J.
R. Howard, M. Eng., Systems Engineer, Safeware Engineering Corporation,
1520 Eastlake Ave. E Suite 101, Seattle, WA 98102, USA, telephone –
(206) 328-4880, facsimile – (206) 328-4887 – email howard@safeware-eng.com Mr.
Howard has experience in software safety and software engineering,
primarily as a developer of tools to support the software safety process. P.
W. Anderson, M. Eng., Systems Engineer, Safeware Engineering Corporation,
1520 Eastlake Ave. Suite 101, Seattle, WA 98102, USA, telephone – (206)
328-4880, facsimile – (206) 328-4887 – email anderson@safeware-eng.com Mr.
Anderson works in the fields of software safety and software engineering.
Currently, he develops tools to support the efforts of software
safety engineers.
|
Copyright © 2003 Safeware Engineering Corporation. All rights reserved