Home > Publications > Safety Risk of Requirements Incompleteness                                                                  Word Doc.

The Safety Risk of Requirements Incompleteness

by Jeffrey Howard and Patrick Anderson. Presented at ISSC 2002, Denver, Aug. 2002.  

For Power Point presentation, click here

Abstract

Many incidents and accidents in real-time software systems have been traced back to incompleteness in the requirements.  This paper introduces a set of criteria that can be used to uncover potentially hazardous incompleteness in software requirements.  SpecTRM-RL is introduced as a readable and reviewable language for software requirements specification that facilitates the checking of these criteria.

Introduction

Many incidents and accidents have been linked to flaws in real-time embedded system software.  Software-related errors can be broadly classified into two categories: (1) failure of the implementation to meet its requirements and (2) failure of the requirements to include necessary properties.  While the first category, implementation errors, should not be dismissed idly, it is the second category that historically has been of greater risk.  If the requirements do not correctly specify the behavior needed to make the system safe, even “correct” software may still contribute to system hazards.

A software requirements specification describes the black-box behavior of the software.  Accidents often result from unhandled or unspecified cases.  Perhaps the most important property of a requirements specification is completeness.  A requirements specification must provide all the information necessary to distinguish between desired and undesired behavior.  In this context, the word “complete,” is not used for its mathematical definition.  Completeness here means that the requirements for safe software behavior are clear enough that any likely interpretation will lead to implementation of safe software behavior.

Some risks associated with real-time software requirements can be reduced by methods that address requirements incompleteness.  Application specific checks of incompleteness can be developed from the system safety analysis.  General criteria are available as well.  Leveson (ref. 1) has compiled a large number of criteria that can be applied to detect incompleteness in requirements.  Particular attention is given to safety-critical real-time systems.  Furthermore, Leveson et al. (ref. 2) have devised a software requirements specification language called SpecTRM-RL that facilitates checking and enforcing completeness criteria.

These criteria are general enough to be used on any real-time system.  System-specific criteria, such as those generated in hazard analyses, are not included.  The criteria have come from basic engineering principles and from extensive examination of incident and accident reports.  They span all aspects of real-time software behavior, from timing to user-interface.  The criteria have been validated on spacecraft projects at JPL (ref. 3) and used on industrial projects at Safeware Engineering Corporation and at other sites.  The criteria have been integrated into a requirements specification language and tools can be used to help the modeler and reviewer ensure that the specifications are acceptably complete for safety-critical systems.  This paper describes how SpecTRM-RL (SpecTRM Requirements Language) models assist review of requirements for completeness.

Real-time System Specification and SpecTRM-RL

Problems are made easier or harder to solve by their representation.  A good format for writing specifications can make checking completeness criteria easier.  Real-time control systems command actuators to affect the behavior of a controlled process (figure 1).  The controller selects commands based on inputs from sensors, predications of future values, and memory of past commands.  Control laws are usually formulated on actual values from the controlled process.  The software controller must use non-ideal measured values.  The controller builds a model of the state of the controlled process.  Accidents and hazards happen when the controller’s model is not synchronized with the actual process state.  For example, if a software controller for an aircraft does not know that the aircraft is on the ground, it may raise the landing gear when commanded to do so.

 

 

Figure 1 - Real-Time System Block Diagram

 

The hazard in the landing gear example indicates that the controller’s model of the controlled process is incomplete.  This incompleteness could be fixed by having the aircraft check for weight on its wheels before raising the landing gear.  Even with this improvement, the same accident could result if the controller has an incorrect model of the controlled process, believing that the aircraft is not on the ground when it is.  Many accidents have been caused by systems that suspend and resume their processing without taking into account that their controlled process may have changed state.

To apply the completeness criteria, the control loop representation is translated to a SpecTRM-RL requirements specification.  SpecTRM-RL models software as a state machine.  The controller’s outputs to actuators are triggered by state changes.  State changes are triggered by changing input values and the passage of time.  Building a model of a system is time-consuming and difficult.  Most projects do not have the resources to develop both a software specification and a separate model for checking completeness criteria.  Furthermore, separating the requirements from the model makes it difficult to keep up with requirements changes.  SpecTRM-RL solves this problem by serving as both the software specification and the model.  Although SpecTRM-RL allows many criteria to be checked with automated tools, we believe that human review by domain experts will always be a critical technique in identifying requirements errors.  SpecTRM-RL therefore has a very readable syntax to encourage human review.

The control loop representation of the system is reflected in the graphical portion of the SpecTRM-RL model.  The example shown in figure 2 is the movement and positioning software (MAPS) for an industrial robot that services tiles on the space shuttle.  This graphical view serves much the same purpose as the control loop, providing a quick summary of information about the model.  The central box represents the software being modeled: MAPS in this case.  To the left of the box are entities that provide control inputs to the software.  MAPS is controlled by an operator through a control panel with a joystick and a planner module that automatically plans a path for the robot.  Above the box are sensory inputs; MAPS receives input from a scanner used to determine the position of the robot.  To the right of the central box and below are outputs.  Outputs shown to the right of the software block are commands given to the industrial robot, such as commands to deploy or retract stabilizer arms.  Other outputs, such as outputs to operator displays or logs are shown below the center box.

 

 

Figure 2 - SpecTRM-RL Graphical View

 

The central box offers an overview of the MAPS software itself.  The model is of the requirements.  No extraneous design information is included.  The software is viewed as a black box; only information necessary to describe the relationship between the inputs and the outputs is depicted in the model.  A SpecTRM-RL requirements model can be thought of a transfer function for the software.  The top left describes the control modes of the system.  Modes divide software behavior into several broad categories.  For example, the behavior of MAPS while enabled is very different from when the software is disabled.  The other entries in the box are inferred system state.  Inferred system state is the state of the controlled plant as inferred by the controller software.  In this case, the state is what MAPS infers to be the state of the industrial robot.  If, by some incompleteness in the requirements, MAPS’ inferred state is incorrect or insufficient, such as the wheel motors being on when MAPS infers that they are off, and accident may result.

Specification Completeness Criteria

As the software specification is developed in SpecTRM-RL, completeness criteria can be checked and enforced.  In all, about 60 criteria have been developed.  The criteria cover

bulletStartup and shutdown
bulletMode transitions
bulletInputs and outputs
bulletValue and timing
bulletEnvironment capacity
bulletFailure states and transitions
bulletHuman-computer interface
bulletRobustness
bulletData age
bulletLatency
bulletFeedback
bulletReversibility
bulletPath robustness

For space constraints, a subset is demonstrated here.  Most of the criteria are integrated into the SpecTRM-RL syntax while the majority of the rest can be checked using automated tools.  This is shown by describing the language components and how each addresses the completeness criteria.

Outputs:  SpecTRM-RL modeling is often begun from the outputs.  Beginning with outputs makes it possible to work backwards from the commands necessary to control the plant, determining what inferred system state must be tracked and thus what inputs are necessary.  An output specification (figure 3) begins with a number of attribute-value pairs providing information about the output.  The triggering condition is a table.  Each expression in the first column is matched to values of true “T”, false “F”, or don’t care “*”.  If every row matches, the table is true.  The rows represent an AND condition.  If there is more than one column of T/F/* entries, the columns represent an OR.  If all the rows of expressions in the first column match the T/F/* entries in a subsequent column, the table as a whole is true.  When a triggering condition AND/OR table is true, the output is sent.  The output message is given values according to the message contents section.  The output specification format lends itself to checking a number of Leveson’s completeness criteria (ref. 1).

Several attributes near the top of the output specification enforce output value criteria.  Output values must be checked to ensure they are not hazardous.  First, acceptable values for the output are recorded.  For numeric values, units must be specified to avoid problems at component interfaces.  The granularity of changes in value must be recorded as well.  Granularity can help to avoid problems at component interfaces.  If control over a valve or other mechanical actuator is not fine-grained enough, an accident or poor system performance may result.

Timing behavior is also addressed by the output specification.  First, initiation delay and completion deadlines are addressed.  The initiation delay is the length of time the software must wait, after the triggering conditions for the output become true, before sending the output.  If this information is not recorded, the software may send mistimed commands to the plant.  Completion deadlines are also important; the control software cannot rely on an output command having been finished until the completion deadline is passed.

 

 



Figure 3 – SpecTRM-RL Output Specification

 

 

The next set of attributes relates to the capacity of the environment around the software for handling output messages.  Outputs must not be sent frequently enough to overwhelm the load tolerance of the environment.  SpecTRM-RL outputs describe their load, minimum time between outputs, and maximum time between outputs.  Output load problems have contributed to accidents.  At one point during the Three Mile Island accident, a printer that reported problems fell two hours behind the state of the system (ref. 4).  In the case of the command shown in figure 3, the load entry reads that the software is required not to send output commands while the robot is considered to be in motion — the physical characteristics of motion should prevent the output command from exceeding the load acceptable to the motion controller.  To a reviewer of the specification, this may immediately raise questions as to whether the software could enter a state where the controller inferred that the robot was still while it was actually moving.  Another question would be whether the motor controller’s message load might be exceeded in the period of time between the issuance of a movement command and the receipt of feedback indicating that the robot began movement.  These are exactly the kinds of questions that the completeness criteria are designed to provoke, and SpecTRM-RL facilitates uncovering both these questions and their answers.

Feedback loops must be included in the software specification.  Accidents occur when the system state inferred by the software controller does not match the actual state of the system.  If no feedback is provided, the controller software cannot determine whether the system has changed based on the commands it has sent.  This can lead to dangerous system behavior.  For example, feedback from the “Move Relative” command is provided by position readings after each move.  Because physical motion is imprecise, if position feedback were not provided, a steadily degrading consistency between the controller’s coordinates for the robot and the robot’s actual position might result in the robot being commanded into a solid object.  Feedback information also addresses latency.  If feedback inputs arrive before the latency period for a command has elapsed, it is possible that the input is spurious.

Another completeness criterion suggests that all outputs must be reversible.  With rare exception, any command that can be given to a system by a software controller should be reversible.  The output specification format of SpecTRM-RL includes an attribute for describing what output reverses the action taken.  The description of “Move Relative” (figure 3) shows that the “Stop” command can be used to stop motion.  Using the triggering condition tables and working backward into the model, it is also possible to use these outputs to verify another of the criteria: if one output is to reverse another, there must be a path through the state space between them.

The triggering condition AND/OR table is a useful tool itself for applying completeness criteria.  The criteria assert that reachable hazardous states should be eliminated whenever possible.  When they cannot be eliminated, their frequency and duration should be reduced as much as possible.  The triggering condition is a guard condition on the production of the output.  Because of SpecTRM-RL’s readable syntax, these guard conditions can be reviewed by domain experts to ensure that an output cannot be produced under circumstances that make the output behavior hazardous.

Modes:  Modes represent a collection of system behaviors.   For example, if the industrial robot is in operator control mode, it will behave differently than when it is in computer control mode.  Mode behavior, particularly regarding mode transitions and mode confusion, has contributed to a number of accidents.  SpecTRM-RL specifies modes as shown in figure 4.

 

 


 

 

Figure 4 – SpecTRM-RL Mode Specification

There are fewer attributes at the beginning of mode specifications than there are for outputs.  The references attribute is filled in with a list of all the elements of the model that are used in the definition of the control mode.  While manual maintenance of these lists might grow tedious, they can be automatically generated by tools.  The model element names listed in the references section can be turned into hyperlinks by most editors, allowing easy navigation around the model.  By providing an easy navigation path through related model elements, the references list can aid in checking any of the completeness criteria that involve multiple model elements.  For example, one criterion states that all information from sensors should be used in the SpecTRM-RL model.  An unused input suggests some incompleteness in the requirements for the software.  Automated tools can verify that every input appears in at least one model element’s references attribute.

Modes also have a list of the elements in which they appear.  This attribute can be used to navigate through the SpecTRM-RL model looking for system behaviors that are affected by the mode.  One of the completeness criteria is that states and modes should not inhibit the production of later required outputs.  For example, if the robot is in computer-controlled mode, it is conceivable that the operator may wish to have the robot take a different path.  Examining the list of model elements in the appears in attribute, it becomes apparent that the robot cannot follow operator movement commands unless the movement control mode can be changed to operator control.  Examining that transition, it becomes evident that any time the deadman switch is released, the system returns to operator control.  Thus, in this example and for this case, the criterion is satisfied.

Lastly, the mode specification includes a definition of possible mode values and the conditions under which the mode changes.  Figure 4 shows four example modes: startup, operator, computer, and halted.  Each of these modes is associated with an AND/OR table.  These AND/OR tables follow the same rules as those for output triggering conditions.  If every row in a T/F/* column matches against the expressions in the first column, that column is true.  If any column in a table is true, the entire table is true.  If an AND/OR table is true, the mode is changed to the mode attached to that table.

These AND/OR transition tables are suitable for checking a variety of completeness criteria.  One criterion checks that the software must always start up in a safe configuration.  The use of a startup mode to handle the initialization of the software provides an opportunity to ensure that the software’s model of the controlled plant can be correctly initialized.  Another criterion is that paths to modes that increase the hazard of the system should be easy to abort, whereas paths that reduce the hazard of the system should be plentiful and easy to follow.  In figure 4, many conditions are sufficient to bring the industrial robot under operator control, but there is only one way to put the robot under computer control.

States:  State elements in SpecTRM-RL represent the model that the software controller has of the system being controlled.  Inputs to the software change the inferred system state, which may in turn change the mode of the system or trigger outputs.  An example SpecTRM-RL state specification from the industrial robot example is included in figure 5.  The references and appears in attributes in the state definition are analogous to those discussed above for modes.  The same associated criteria can be checked for states.

 


Figure 5 – SpecTRM-RL State Specification

SpecTRM-RL state specifications model the state of the process.  If no new information is received from the modeled process for some time, it is unsafe to rely on the outdated information.  For example, when the robot switches from computer control to operator control (figure 4), commands sent to the motor controller are velocity based rather than position based.  In order to operate correctly and safely, the robot’s motor controller must be set to relative position mode or velocity mode accordingly.  If MAPS has not received an input from the motor controller in some time, it cannot be sure what mode the motor controller is really in — particularly if some suspension of input processing may have taken place.  Unknown gives the engineer writing the requirements a way to reason about situations in which the controller does not have information. 

The AND/OR transition tables are, in other respects, similar to the mode transition tables discussed above.  A state element transitions to a new state value when the AND/OR table associated with that value is true.  The tables themselves are easy to read and review.  Two completeness criteria, the robustness and nondeterminism criteria, are easy to check with automated tools once the specification is written in SpecTRM-RL.  To meet the robustness criteria, there must be no conditions under which all the AND/OR tables are false.  For example, in figure 5, suppose that Power Up is false and the “Motor Controller Setting” input is not obsolete, uninitialized, relative, or velocity.  Under those conditions, no table is true, and the state machine cannot make a transition of any kind.  In reality, the only values possible for the input are the four tested in the AND/OR tables, so the state description is robust.  Software-related accidents are often caused by unhandled cases in requirements specifications.  Robustness ensures that all cases are handled by the state element.

The nondeterminism criterion states that all SpecTRM-RL state elements must be deterministic in their behavior.  To be deterministic, there must not be any circumstances under which more than one AND/OR table can be true at the same time.  The state element described in figure 5 is not deterministic. Suppose that power up is true, meaning that the software has just been started, and a motor controller setting input comes in with a value of relative at the same time.  Both the unknown AND/OR table and the relative AND/OR table are true.  In practice, the behavior of the software will be implementation dependent.  It is very difficult to analyze nondeterministic systems for properties such as safety.  Therefore, it is strongly desirable to remove nondeterminism from a system.  Automated tools can help enforce both robustness and determinism in a SpecTRM-RL specification.

Inputs:  Inputs are the last major grouping of specification elements in SpecTRM-RL.  An example input from the industrial robot example is shown in figure 6.  The general form of the input is very similar to the other SpecTRM-RL model elements.  First, a series of attributes describes the input.  Second, a series of tables defines the behavior of the input.


Figure 6 – SpecTRM-RL Input Specification

Many of the completeness criteria deal with inputs.  First, inputs have the same interface attributes that adorn outputs.  Possible values for the input, their units, and their granularity are all written in the early part of the input specification.  In the example of the motor controller setting (figure 6), the units and granularity are irrelevant, as the possible values are enumerated values.  In addition to their possible values, every input can also take on the special value of obsolete.  Obsolete is used to enforce the completeness criteria related to data age.

Timing behavior follows the possible value block.  The load attribute describes the maximum load that software is expected to handle.  If inputs message arrive faster than this, some fallback mechanism is necessary.  The minimum and maximum time between inputs is recorded as well.  Several criteria relate to the timing and capacity of inputs.  A minimum and maximum load description is required for every input that is not tied to another event.  Another criterion states that a minimum arrival rate is necessary for each communication path in order to distinguish between normal silence and a component failure of the communication path.  Normal silence must be differentiated from erroneous silence. 

The completeness criteria also require that some response to a violation of capacity assumptions must be included in the software specification.  These responses are described in the exception handling attributes.  Many approaches are available, from dropping extraneous inputs to shedding other functionality to free up resources; the context of the system will guide engineers in making trade-offs.  SpecTRM-RL and the completeness criteria serve to highlight the need for these decisions and provide assistance in making good choices.  Additional criteria cover specific possible responses.  For example, if function shedding or some other dynamic reconfiguration is used to cope with excessive input load, some hysteresis should be included in the conditions that return the system to normal operations.  This extra margin prevents the software form jittering or thrashing between modes.

The next block of attributes covers the relationship between inputs and outputs.  One of the completeness criteria states that input capacity must be exceeded by the output capacity of any output that the input may trigger.  Less formally, if inputs enter the software more quickly than the outputs they trigger are permitted to leave, a potentially dangerous backlog will accrue.  Tracking which outputs an input may influence makes checking this criterion easier.  The latency attribute covers feedback from the input perspective.  Any output for which this input may provide feedback information is listed in the related outputs field.  If an input arrives within the latency period of an output, some contingency action may be necessary.  Another criterion states that if an input that should only be received in response to an output, spontaneous receipt of that input must be treated as an abnormal condition.  The attributes block makes it easy to check this criterion by linking inputs to relevant outputs as well as the states and modes that depend on the input.

The next attributes in the input specification relate to obsolescence.  These attributes are very important, as no data is good forever.  Accidents can be avoided if software does not act upon out of date information as though it were current and correct.  Additionally, obsolescence of inputs can help with a criterion applicable to startup: the maximum amount of time that the software waits for initial inputs must be specified.  Because inputs start out obsolete until they are first received, if an input remains obsolete for too long, it can be concluded that the communications medium failed before the software was started.  The AND/OR tables also support the obsolescence criteria, describing the circumstances under which the input takes on the obsolete value.

The appears in attribute, which is present in every model element, is particularly important for inputs.  Many of the completeness criteria for inputs depend on the software responding to inputs outside their expected range, such as inputs that are too early, too late, missing altogether, or nonsensical values.  In order to accommodate these conditions and meet the completeness criterion, state, mode, and output specifications will be involved as well as the inputs themselves.  The appears in attribute provides a navigational aid to verify that these criteria have been met.

Summary

Most accidents related to software stem from requirements problems, particularly unhandled cases, and the completeness criteria described here were developed using both theory and experience from accidents related to software over the past 20 years.  The criteria have been successfully used to find many critical requirements errors on industry projects and are specifically designed for safety-critical systems.

Some users have reported that although they have found many important requirements errors using the criteria, they are hard to use as a checklist.  In response to this problem, the criteria have been integrated into a state-of-the-art requirements specification language (SpecTRM-RL) and tools for automated enforcement of the criteria have been designed.  Safeware Engineering Corporation is working on a suite of requirements development and management tools to support the use of SpecTRM-RL and the checking of the completeness criteria.

References

1.  Leveson, Nancy G., Safeware: System Safety and Computers, Addison-Wesley Publishing Company, Reading Massachusetts, 1995.

2.  Leveson, Nancy G., Mats P. E. Heimdahl, and Jon Damon Reese, “Designing Specifications for Process Control Systems: Lessons Learned and Steps to the Future”, Proceedings of the 7th European Engineering Conference held jointly with the 7th ACM SIGSOFT Symposium on the Foundations of Software Engineering, Springer-Verlag, London, UK 1999.

3.  Lutz, Robyn R., “Targeting safety-related errors during the software requirements analysis”, Proceedings SIGSOFT ’93: Foundations of Software Engineering, ACM Press, New York, 1993.

4.  Kemeny, John G., “Saving American Democracy: The lessons of Three Mile Island,” Technology Review, June 1980.

Author Information

J. R. Howard, M. Eng., Systems Engineer, Safeware Engineering Corporation, 1520 Eastlake Ave. E Suite 101, Seattle, WA 98102, USA, telephone – (206) 328-4880, facsimile – (206) 328-4887 – email howard@safeware-eng.com

Mr. Howard has experience in software safety and software engineering, primarily as a developer of tools to support the software safety process.

P. W. Anderson, M. Eng., Systems Engineer, Safeware Engineering Corporation, 1520 Eastlake Ave. Suite 101, Seattle, WA 98102, USA, telephone – (206) 328-4880, facsimile – (206) 328-4887 – email anderson@safeware-eng.com

Mr. Anderson works in the fields of software safety and software engineering.  Currently, he develops tools to support the efforts of software safety engineers.

 

Home Products Services Publications White Papers About Us

Copyright © 2003 Safeware Engineering Corporation. All rights reserved