Architectural issues in software fault tolerance j. This chapter presents a nonhomogeneous poisson progress reliability model for nversion programming systems. Software architecture reliability analysis using failure. The concept of nversion programming was introduced in 1977 by liming chen and algirdas. The need of software faulttolerance provisions located in the application layer is supported by studies that showed that the majority of failures experienced by nowadays computer systems are. Since correctness and safety are really system level concepts, the need and degree to use software fault tolerance is directly dependent.
Software fault tolerance by design diversity 1995 citeseerx. Dependability modeling for fault tolerant software and systems j. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to be able to solve the design fault problem. Reliability oriented design methods and programming techniques 4. Lyu and published by john wiley and sons 1995, isbn 0471950688,425 pages, paperback. In particular a complex system may be composed of some smaller components each comprising some fault detection and fault tolerance capabilities lyu, 1995. Abstract fault tolerance is the survival attribute of a system or component to continue operating as required despite the manifestation of.
Software fault tolerance professur fur systems engineering. With the increasing size and complexity of software in embedded systems, software has now become a primary threat for the reliability. Fault tolerance techniques achieve the design for reliability, fault removal techniques achieve the testing for reliability, and. Several mature conventional reliability engineering techniques exist in literature but traditionally these have primarily addressed failures in hardware components and usually assume the availability of a running system. Reliability and fault correlation are two main concerns for design diversity, yet empirical data are limited in investigating these two. Software reliability engineering involves techniques for the design, testing and evaluation of software systems, focusing on reliability attributes. This important book also focuses on identification, application, formulation and evaluation of current software tolerance techniques.
Consequently, software reliability can be improved by treating software faults properly, using techniques of fault tolerance, fault removal, and fault prediction. The presented software fault tolerance techniques can be used at different levels of the system. As a research project for my master degree i am working on a framework for fault injection testing distributed systems and i have been doing some reading around the s. Design for reliability is achieved by faulttolerance techniques that keep the system working in the presence of software faults. Current methods for software fault tolerance include recovery blocks. Software reliability is closely influenced by the creation, manifestation and impact of software faults. Nasacr1 97999 nasacr197999 software fault n9524993 tolerance in computer operating systems illinois univ. Software fault tolerance is an immature area of research. From inside the book what people are saying write a. The complete text of software fault tolerance, written by michael r. Optimal fault tolerance strategy selection for web services zibin zheng the chinese university of hong kong, china and michael r. Vmware vsphere 6 fault tolerance is a branded, continuous data availability architecture that exactly replicates a vmware virtual machine on an. Optimal fault tolerance strategy selection for web. Software fault tolerance in computer operating systems.
Reliability evaluation of serviceoriented architecture. The adoption of software fault tolerance techniques based on design diversity has been advocated as a means of coping with residual software design faults in operational software lee and anderson 1990. Soa enables faster integration of existing software components from different parties, makes fault tolerance ft feasible, and is also one of the fundamentals of cloud computing. His research interests include software reliability engineering, distributed systems, fault tolerant computing, and machine learning. Optimal fault tolerance strategy selection for web services. Software fault tolerance cmuece carnegie mellon university. The following software fault avoidance rules, as suggested by lyu, should be followed regardless of the type of installed software structure. The essence of this book is the presentation of the software fault tolerance techniques themselves. Software fault tolerance refers to the use of techniques to increase the likelihood that the final design embodiment will produce correct andor safe outputs. Avizienis, the methodology of nversion programming, in software fault tolerance, m. Exception handling and tolerance of software faults f.
Handbook of software reliability engineering michael r. Online shopping from a great selection at books store. The primary software fault tolerance techniques include recovery blocks and nversion programming nvp covered in detail in lyu 1995. A unified view on learning with labeled and unlabeled data. Since correctness and safety are really system level concepts, the need and degree to. The book is intended for practitioners and researchers who are concerned with the dependability of software systems.
Data diverse software fault tolerance techniques 6. Alzahrani n and petriu d modeling fault tolerance tactics with reusable aspects proceedings of the 11th international acm sigsoft conference on quality of software architectures, 4352 martin l, koziolek a and reussner r qualityoriented decision support for maintaining architectures of faulttolerant space systems proceedings of the 2015. Zheng z and lyu m 20 personalized reliability prediction of web services, acm transactions on software engineering and methodology. Nversion programming nvp, also known as multiversion programming or multipleversion dissimilar software, is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the same initial specifications. An empirical study on testing and fault tolerance for. Chapter 11 in software fault tolerance, michael lyu, ed. Apr 20, 2012 the complete text of software fault tolerance, written by michael r. Designing faulttolerant soa based on design diversity.
The study 29 shows that system and applications software can potentially detect and correct some or many of these errors by using different software fault tolerance approaches such as replication, voting, and masking with a focus on algorithmbased faulttolerance 7, 31,32,33,34,35,37 or by using a combined software and hardware approaches. We conducted a major experiment to engage 34 programming teams to independently develop multiple software versions for an industryscale critical flight. Please cite the book properly in resulted publications. Software fault tolerance free computer, programming. Chapter 3 presents programming practices used in several software fault tolerance techniques, along with common problems and issues faced by various approaches to software fault tolerance. Introduction to reverse engineering software by mike perry, nasko oskov uiuc an introduction to reverse engineering software under both linux and windows. Software fault tolerance carnegie mellon university. Fault tolerance also resolves potential service interruptions related to software or logic errors. Serviceoriented architecture soa provides an elastic and automatic way to discover, publish, and compose individual services. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to.
Jun 18, 2003 an incremental recovery cache supporting software fault tolerance, in reliable software technologiesadaeurope99, santander, spain, june 711, 1999, lecture notes in computer science 1622, pp. Single version techniques focus on improving the fault tolerance of a. As software fault tolerance is often measured in terms of system availability. His research interests include software reliability engineering, distributed systems, faulttolerant computing, and machine learning. How are mission critical systems designed to handle system. International journal of web services research ijwsr 74. Research openaccess designingfaulttolerantsoabasedondesign. Dependability modeling for faulttolerant software and systems j. However, the unpredictable nature of soa systems introduces new. Rogers p and wellings a the application of compiletime reflection to software fault tolerance using ada 95 proceedings of the 10th adaeurope international conference on reliable software technologies, 236. Avizienis, the methodology of nversion programming, in.
Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Software engineering for internet applications by eve andersson, philip greenspun, andrew grumet the mit press after completing this course on serverbased internet applications software, students who start with only the knowledge of how to write and debug a computer program will have learned how to build webbased applications on the scale of. An initial specification of the intended functionality of the software is developed. Software fault tolerance guide books acm digital library. In the field of software fault tolerance we also offer a seminar that allows students to research on current topics and a computer lab to get handson experience for the mechanisms presented in the lecture. Commonly used fault tolerance requirement is expressed with minimum interarrival time tf between two successive faults or the reliability goal of ti. The purpose is to prevent catastrophic failure that could result from a single point of failure. All requirements should be specified and analyzed with. Avizienis, the methodology of nversion programming. This value for interarrival time between faults is either derived from past system fault data or assumed to be the worst case value the system can cope with. Textbook n no textbook n useful references n software fault tolerance techniques and implementation n laura pullum, artechhouse publishers, 2001, isbn 1 5805377 n software reliability engineering n michael r.
Software testing and software fault tolerance are two major techniques for developing reliable software systems, yet limited empirical data are available in the literature to evaluate their effectiveness. Design for reliability is achieved by fault tolerance techniques that keep the system working in the presence of software faults. According to lyu 95, software is a systematic representation and processing of human knowledge. Software fault tolerance and the handbook of software reliability engineering. Checkpoint placement for faulttolerant realtime systems. An assumption of software fault tolerance techniques is that the probability of having the same fault in multiple variant components is lower, meaning that a fault present in a component should be detected and tolerated based on the behaviour of other variants lyu. Software fault tolerance how is software fault tolerance abbreviated. Design fault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. The study 29 shows that system and applications software can potentially detect and correct some or many of these errors by using different software fault tolerance approaches such as replication, voting, and masking with a focus on algorithmbased fault tolerance 7, 31,32,33,34,35,37 or by using a combined software and hardware approaches. We conducted a major experiment to engage 34 programming teams to independently develop multiple software versions for an industryscale critical flight application, and collected faults. A faulttolerance approach to reliability of software operation, digest of eighth annual intl conf. The author next briefly describes these techniques and examines how they deal with failures and related faults. This chapter concentrates on software fault tolerance based on design diversity.
Fault tolerant software has the ability to satisfy requirements despite failures. Testing for reliability is achieved by faultremoval techniques that detect and correct software faults. Fault tolerant software systems using software configurations. He is a fellow of the acm, the ieee, and the aaas, and a croucher senior research fellow for his contributions to software reliability engineering and software fault tolerance. According to software reliability engineering, the main approaches to build reliable software systems are 1 fault forecasting 6, 7, 2 fault prevention, 3 fault removal and 4 fault tolerance. Design, testing, and evaluation techniques for software.
Software architecture analysis methods aim to analyze the quality of software intensive system early at the software architecture design level and before a system is implemented. Lyu the chinese university of hong kong, china source title. An incremental recovery cache supporting software fault tolerance, in reliable software technologiesadaeurope99, santander, spain, june 711, 1999, lecture notes in computer science 1622, pp. Proceedings of the 23rd international conference on machine. Abstract fault tolerance is the survival attribute of a system or component to continue operating as required despite the manifestation of hardware or software faults. The need of software fault tolerance provisions located in the application layer is supported by studies that showed that the majority of failures experienced by nowadays computer systems are due. Fault prevention and fault tolerance techniques are leveraged in the development of large and reliable complex software systems.
Software fault tolerance how is software fault tolerance. He is now a professor at the chinese university of hong kong in shatin, hong kong. Fault tolerance patterns and antipatterns chaos monkey and other netflix tools related courses. The author uses the scientific method to deduce specific behavior and to target, analyze, extract and modify specific operations of a program for interoperability purposes. A tutorial, 2000 nasa report, available online the ideas of masking redundancy, standby redundancy, and selfchecking design have been shown to be applicable to software, leading to various types of faulttolerant software flaw tolerance is a better term. The approaches to reliable software systems include fault prevention e. In previous work, we conducted a software project with realworld application for investigation on software testing and fault tolerance for design diversity. We separate all faults within nvp systems into independent faults and common faults, and model each type of failure as nhpp. Iyer and inhwan lee university of illinois at urbanachampaign abstract.
Michael is well known to the software engineering community as the editor of two classic book volumes in software reliability engineering. Design pattern representation for safetycritical embedded systems. Designfault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. Software fault tolerance techniques involve error detection, exception handling, monitoring mechanisms and. Alzahrani n and petriu d modeling fault tolerance tactics with reusable aspects proceedings of the 11th international acm sigsoft conference on quality of software architectures, 4352 martin l, koziolek a and reussner r qualityoriented decision support for maintaining architectures of fault tolerant space systems proceedings of the 2015. Lyu, 1995, in the fastgrowing field of service computing, systematic and comprehensive studies on software fault tolerance techniques to transactional web services are still. In this paper, a distributed fault tolerance strategy evaluation and selection framework is proposed based on versatile fault tolerance techniques. Software fault tolerance is editted by by michael r. Design diverse software fault tolerance techniques 5. In this paper, the authors apply software fault tolerance techniques for web services, where the component failures are handled by fault tolerance strategies. Design diversity is the provision of software components. Experimental evaluation of hardwaresoftware fault tolerance. Software fault tolerance techniques involve error detection.
1307 1118 782 663 63 31 365 216 272 753 68 776 948 1642 1303 1119 1115 537 562 1400 721 1432 1389 502 855 748 1202 1035 1265 545 1161 1389 640 1187 17