System evaluation using fault and error injection

Computer Engineering Research Center
The University of Texas at Austin

A major problem in the development of fault-tolerant systems is the accurate determination of the dependability properties of the system. Unlike performance, which can be evaluated through the use of benchmark programs, the degree of fault tolerance and reliability of a system cannot be evaluated in such a manner, since we do not often have the luxury of allowing systems to run for many years to see their behavior under fault effects. The generally accepted solution to this problem is to inject the effects of faults in a simulation model or a prototype implementation, and to observe the behavior of the system under the injected faults. Fault injection in a simulation is very flexible but far too time consuming. On the other hand, it is much more difficult to inject accurate (i.e., realistic) faults into a prototype, but the effects of faults on operational code can be readily observed.

We have developed powerful techniques for emulating hardware faults using software. These were incorporated into a flexible fault and error injection system, FERRARI. We modify the executable image of a program so that when this modified code is executed, the behavior of the system would be the same as if the internal fault had been present. This approach would provide the desired flexibility, and at the same time, would allow us to execute many experimental runs in a relatively short time. FERRARI is based on the public-domain GNU tools, and the first implementation was on SPARC stations.

Many commercial applications, however, use proprietary compilers and debuggers, for which the source code is not available. We recently developed a new fault injection system, FIESTA, geared to embedded COTS systems. This injection system is implemented under the real-time operating system, VXWorks, with the target system currently an embedded Motorola 68040 controller. Faults can be injected into applications compiled with GNU tools as well as the GreenHills ADA system.


G. A. Kanawati, N. A. Kanawati and J. A. Abraham, "EMAX: A High Level Error Model Automatic EXtractor," Proceedings AIAA-93, San Diego, CA, October 19-21, 1993, pp. 1297-1305.

Abstract

This paper presents EMAX, a High-Level Error Model Automatic EXtractor. EMAX simulates all user-selected, low-level faults (at the gate and/or the switch level) that may occur inside a processor chip, and generates the error output patterns produced by the faulty circuits. These generated patterns are used to extract high-level error models. When these error models are further analyzed, a sequence of instructions can be derived which, when executed, produce the same error patterns as those obtained when simulating the hardware with low level faults. When this sequence of instructions is fed to a software fault and error injection tool, it willl allow the use of accurate and cost-effective higher level fault/error injection patterns for validating the dependability properties of a system. Error models extracted for an example processor are presented and are analyzed.


G. A. Kanawati, N. A. Kanawati and J. A. Abraham, "FERRARI: A Flexible Software-Based Fault and Error Injection System," IEEE Transactions on Computers, vol. 44, no. 2, February 1995, pp. 248-260.

Abstract

A major step toward the development of fault-tolerant computer systems is the validation of the dependability properties of these systems. Fault/error injection has been recognized as a powerful approach to validate the fault tolerance mechanisms of a system and to obtain statistics on parameters such as coverages and latencies. This paper describes the methodology and guidelines for the design of flexible software based fault and error injection and presents a tool, FERRARI, that incorporates the techniques. The techniques used to emulate transient errors and permanent faults in software are described in detail. Experimental results are presented for several error detection techniques, and they demonstrate the effectiveness of the software-based error injection tool in evaluating the dependability properties of complex systems.


N. A. Kanawati, G. A. Kanawati and J. Abraham, "Dependability Evaluation using Hybrid Fault/Error Injection." Proceedings IEEE International Test Conference, Erlangen, Germany, April 24-26, 1995, pp. 224-233.

Abstract

This paper presents a new hybrid fault/error injection technique which overcomes the limitations of both software-based and hardware-based approaches. The logic for the hardware fault injection circuitry is implemented using Field Programmable Gate Arrays, and the software is an extension of FERRARI, the software-based fault injection system. The combination of these techniques allows the incorporation of new capabilities by the use of mechanisms to trigger and synchronize the injection of a fault or error with events in the system. Results of physical fault/error injection experiments on a SPARC1 system are presented. The injection was synchronized to the executing modes and load conditions of the system. These results show that the system behavior is very sensitive to the internal state and load. Therefore, in order to validate the dependability properties of a system, it is imperative to inject faults/errors while the system is in critical conditions and different execution modes.


Back to: