Accidents can be divided into two broad categories: component failure accidents and system accidents. Component failure accidents are a result of single or multiple component failures. An example of a mechanical part wearing out to the point of failure would be bald tires; in bad road conditions they can contribute to an accident. System accidents arise from the interactions of components. None of the components may have failed, but the accident still occurs. System accidents are caused by interactive complexity and tight coupling. As system components grow more tightly coupled, it becomes more difficult to foresee the interactions the components have. This problem is exacerbated by the introduction of computers and software.
Consider the diagram above. A chemical reactor is fed catalyst. The rate at which the catalyst is added to the reaction is controlled by computer. When the reaction rate must be increased, the computer opens the catalyst valve and then opens the cooling water valve. When the rate is to be reduced, the computer closes the water valve and then the catalyst valve. There is some chance that the operator may suspend operation of the computer. If this happens while the catalyst valve is open, but the cooling water valve is closed, an accident will occur. None of the valves have failed, and neither has the computer. But the components worked together to cause an accident. The interaction of the computer's steady-state regulatory functions and the operator's startup and shutdown commands were complex enough to be overlooked by the designer.
Complexity is a moving target. What seems simple or complex depends a great deal on the representation of the problem. The underlying factor is intellectual manageability. A "simple" system has a small number of unknowns in its interactions within the system and with its environment.
A system is intellectually unmanageable when the level of interactions reaches the point where they cannot be thoroughly
Introducing new technology introduces unknowns and even unknown-unknowns.
One means of coping with complexity is analytic reduction. The system is divided into distinct parts for analysis purposes, and the parts are examined separately. Analytic reduction relies on three important assumptions.
Statistics provides another way to cope with complexity. Statistics assumes that the system can be treated as a structureless mass with interchangeable parts (data points). The Law of Large Numbers is used to describe behavior in terms of averages. Use of statistics assumes that components are sufficiently regular and random in their behavior that they can be studied statistically.
Software is too complex for complete analysis. The separation of software into non-interacting subsystems distorts the results. The most important properties of software are emergent; these properties cannot be easily deduced from the decomposed parts. Software is also too organized for statistics. There is too much underlying structure to software that distorts the statistics.
Systems theory was developed for biology (Bertalanffly) and cybernetics (Norbert Weiner). Systems theory is used for systems too complex for complete analysis and too organized for statistical analysis. The theory concentrates on analyses and design of the whole as distinct from the parts. This is accomplished with two pairs of ideas.
Safety is an emergent system property. It is not a component property. Safety can only be analyzed in the context of the whole system. Component properties alone do not consider the results of interactions between components in the system; these interactions have a large impact on system safety.
In the model of accidents used with systems theory, accidents arise from interactions among humans, machines, and the environment. Accidents are not simply a chain of events or linear causality. Accidents arise from more complex causal connections. Safety is an emergent property that arises when components of the system interact with each other within a larger environment. Safety is enforced by a set of constraints related to the behavior of components in the system. When appropriate constraints are lacking, or when component interactions violate safety constraints, accidents occur. Software, as a controller, embodies or enforces those safety constraints.
Software-related accidents are usually caused by flawed requirements. Incomplete or wrong assumptions about the operation of the controlled system can cause software related accidents, as can incomplete or wrong assumptions about the required operation of the computer. Frequently, omitted requirements leave unhandled controlled-system states and environmental conditions. Merely getting the software "correct" or making it reliable will not make the software any safer under these conditions. Software may be highly reliable and "correct" and still be unsafe.
If the software is unsafe in any of the ways outlined above, an accident may result.
Copyright © 2003 - 2016 Safeware Engineering Corporation. All rights reserved