The Risk of Computers
Software is the design of a machine abstracted from its physical realization. A general purpose computer without software performs no function. It is a physical realization, but not of any particular computation. Software provides the design of a special purpose machine. When that software runs on a general purpose computer, then the general purpose computer becomes a special purpose machine for the duration of the program. By changing the software running on a computer, the computer becomes a new special purpose machine. The ease of changing software, and thus the purpose of the machine, is the greatest benefit and the source of risk in software systems.
This abstraction step is very powerful. Machines that were physically impossible or impractical to build are now feasible. Furthermore, the design of the special purpose machine can be changed without retooling or manufacturing. Within very broad limits, changing code changes the function computed by the machine without altering the hardware that runs the code. Lastly, software also allows its creators to concentrate on the steps to be performed without worrying about how those steps will be carried out physically.
The Blessing and Curse of Flexibility
Computers, and thus software, are so powerful and so useful because they have eliminated many of the physical constraints of previous machines. Designers and programmers do not have to worry about the physical realizations of their designs. This power and flexibility are also the curse of software. There are no longer physical laws to limit the complexity of design. Performance constraints of available hardware are the most significant boundaries left, and they do little to control complexity.
Physical laws and limitations are helpful in that they constrain the range of possible solutions to a problem. Fundamental limitations such as electrical properties, structural strengths, melting points, reaction rates, and so on help to keep solutions simple. The more complex the design, the more chance that it will involve the violation of some physical law and have to be discarded. Without physical laws to limit the complexity of software, there has been no incentive to enforce discipline on design, construction, and modification.
In fact, software is so flexible that projects often start working with it before they fully understand what they need to do. Brooks' proposed that one plan to throw the first implementation of a system away, because it will become necessary to do so anyway. A whole host of software development methodologies, such as rapid prototyping, advocate writing software in order to explore what functions the software needs to have.
The flexibility of software is a curse, not just in at the beginning of a project, but at the end as well. Software becomes the resting place of afterthoughts. In many systems, problems in the physical realization of the system are solved by having software compensate. In some cases, errors in the hardware may be worked around in software. In other cases, functions that prove too expensive to implement in hardware are moved to software. These changes led to the observation: "And they looked upon the software and saw that it was good, but they had to add one other feature..."
Modeling, Proofs, and Testing
A number of myths have sprung up around software:
Mathematical modeling is a problematic approach to dealing with the complexity of software. There are a large number of potential states in any realistic system. These states lack physical continuity, which requires discrete rather than continuous math. Specifications and proofs using logic may be attempted, but they are often
Unfortunately, this means that the logic and proofs are as difficult and error-prone as the code itself. Another approach has been to try to measure the quality of software after its implementation. Unfortunately, good ways to measure the quality of software have not yet been developed.
Black box testing derives test data solely from the specification. No knowledge of the internal structure of the program is used to develop the tests. To guarantee the safety of a program, black box testing would need to test every possible input. Because the system is a black box, the only way to be sure is to try every possible input condition. Valid inputs must be used up to the maximum size of the machine, which is not, in and of itself, astronomical. But all invalid input must be tested as well. For example, an Ada compiler must behave correctly when run on all valid and all invalid programs.) And if the program has memory, then all possible unique valid and invalid sequences of inputs need to be tested. For all but the simplest programs, exhaustive black box testing is impractical.
White box testing derives test data by examining the program's logic. To guarantee safety, the paths of the program would have to be tested exhaustively. There are two flaws with this approach. Consider the control flow diagram below.
Suppose that the loop in this control flow diagram is executed twenty times. There are five possible paths for each iteration. The number of unique paths through the whole loop is astronomical:
520 + 519 + 518 + .. + 5 = 1014 = 100 trillion
Just for this loop, if one could develop, execute, and verify one test case every five minutes, the testing process would take about 1 billion years. With a magic test processor that could develop, execute, and evaluate one test per millisecond, the process would still take 3170 years.
Even if it were possible to test every path, the program might still have errors. White box testing does not guarantee that the program matches its specification; the program could be free of errors but still be the wrong program. Because white box testing detects defects in existing paths in the program, it will not detect the absence of necessary paths in the code.
White box testing may also miss data-dependent errors. For example, a program that compares two number for convergence using the statement "if (A - B) < epsilon" is wrong; it should compare "abs(A - B)" to epsilon. However, detection of this error is dependent on the values used for A and B and would not necessarily be found by executing every path through the program.
The Problem to be Solved
Mechanical systems are at one end of a an evolutionary scale of process control systems. Operators have direct sensory perception of the process. Displays are connected directly to the process and thus are physical extensions of it. Design decisions are highly constrained by available space, the physics of the underlying process, and the limited possibility of action at a distance.
Computer-based systems are at the opposite end of the evolutionary scale from mechanical systems. Software allows multiplexing of controls and displays. The constraints on the design of the system are greatly relaxed, which introduces more possibility for error. The physical constraints of a mechanical system often shaped the environment in ways that efficiently transported process information to operators. Less formally, the operator of a mechanical system can directly observe the feedback from the process and get a feel for it. In computerized systems, it is challenging to capture and present these qualities to operators.
The primary safety problem in computer-based systems is the lack of appropriate constraints on design. The job of the system safety engineer is to identify the design constraints necessary to maintain safety and to ensure the system and software design enforces them.
Copyright © 2003 - 2016 Safeware Engineering Corporation. All rights reserved