His research interests include software reliability engineering, distributed systems, faulttolerant computing, and machine learning. Alzahrani n and petriu d modeling fault tolerance tactics with reusable aspects proceedings of the 11th international acm sigsoft conference on quality of software architectures, 4352 martin l, koziolek a and reussner r qualityoriented decision support for maintaining architectures of fault tolerant space systems proceedings of the 2015. Software fault tolerance by design diversity 1995 citeseerx. Checkpoint placement for faulttolerant realtime systems. Software fault tolerance refers to the use of techniques to increase the likelihood that the final design embodiment will produce correct andor safe outputs. With the increasing size and complexity of software in embedded systems, software has now become a primary threat for the reliability. Optimal fault tolerance strategy selection for web services zibin zheng the chinese university of hong kong, china and michael r. Chapter 3 presents programming practices used in several software fault tolerance techniques, along with common problems and issues faced by various approaches to software fault tolerance. This value for interarrival time between faults is either derived from past system fault data or assumed to be the worst case value the system can cope with. In the field of software fault tolerance we also offer a seminar that allows students to research on current topics and a computer lab to get handson experience for the mechanisms presented in the lecture. Consequently, software reliability can be improved by treating software faults properly, using techniques of fault tolerance, fault removal, and fault prediction. An assumption of software fault tolerance techniques is that the probability of having the same fault in multiple variant components is lower, meaning that a fault present in a component should be detected and tolerated based on the behaviour of other variants lyu. Exception handling and tolerance of software faults f. This important book also focuses on identification, application, formulation and evaluation of current software tolerance techniques.
Reliability oriented design methods and programming techniques 4. Software reliability engineering involves techniques for the design, testing and evaluation of software systems, focusing on reliability attributes. Experimental evaluation of hardwaresoftware fault tolerance. The author next briefly describes these techniques and examines how they deal with failures and related faults. Commonly used fault tolerance requirement is expressed with minimum interarrival time tf between two successive faults or the reliability goal of ti. Fault tolerance techniques achieve the design for reliability, fault removal techniques achieve the testing for reliability, and. Serviceoriented architecture soa provides an elastic and automatic way to discover, publish, and compose individual services. Apr 20, 2012 the complete text of software fault tolerance, written by michael r. Abstract fault tolerance is the survival attribute of a system or component to continue operating as required despite the manifestation of hardware or software faults.
As software fault tolerance is often measured in terms of system availability. Since correctness and safety are really system level concepts, the need and degree to use software fault tolerance is directly dependent. Software architecture analysis methods aim to analyze the quality of software intensive system early at the software architecture design level and before a system is implemented. Avizienis, the methodology of nversion programming, in. Testing for reliability is achieved by faultremoval techniques that detect and correct software faults.
Several mature conventional reliability engineering techniques exist in literature but traditionally these have primarily addressed failures in hardware components and usually assume the availability of a running system. In particular a complex system may be composed of some smaller components each comprising some fault detection and fault tolerance capabilities lyu, 1995. Software fault tolerance how is software fault tolerance. A faulttolerance approach to reliability of software operation, digest of eighth annual intl conf. This chapter presents a nonhomogeneous poisson progress reliability model for nversion programming systems. Since correctness and safety are really system level concepts, the need and degree to. Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. The purpose is to prevent catastrophic failure that could result from a single point of failure. An incremental recovery cache supporting software fault tolerance, in reliable software technologiesadaeurope99, santander, spain, june 711, 1999, lecture notes in computer science 1622, pp. All requirements should be specified and analyzed with.
We separate all faults within nvp systems into independent faults and common faults, and model each type of failure as nhpp. Software fault tolerance cmuece carnegie mellon university. Vmware vsphere 6 fault tolerance is a branded, continuous data availability architecture that exactly replicates a vmware virtual machine on an. Michael is well known to the software engineering community as the editor of two classic book volumes in software reliability engineering. Dependability modeling for fault tolerant software and systems j. Reliability evaluation of serviceoriented architecture. Please cite the book properly in resulted publications. Research openaccess designingfaulttolerantsoabasedondesign. Fault tolerance patterns and antipatterns chaos monkey and other netflix tools related courses. Dependability modeling for faulttolerant software and systems j.
However, the unpredictable nature of soa systems introduces new. Software reliability is closely influenced by the creation, manifestation and impact of software faults. Designfault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. In this paper, the authors apply software fault tolerance techniques for web services, where the component failures are handled by fault tolerance strategies. The presented software fault tolerance techniques can be used at different levels of the system. Handbook of software reliability engineering michael r. The concept of nversion programming was introduced in 1977 by liming chen and algirdas. Lyu the chinese university of hong kong, china source title. Software fault tolerance and the handbook of software reliability engineering.
Software testing and software fault tolerance are two major techniques for developing reliable software systems, yet limited empirical data are available in the literature to evaluate their effectiveness. Chapter 11 in software fault tolerance, michael lyu, ed. Textbook n no textbook n useful references n software fault tolerance techniques and implementation n laura pullum, artechhouse publishers, 2001, isbn 1 5805377 n software reliability engineering n michael r. Software fault tolerance techniques involve error detection, exception handling, monitoring mechanisms and. The need of software faulttolerance provisions located in the application layer is supported by studies that showed that the majority of failures experienced by nowadays computer systems are. As a research project for my master degree i am working on a framework for fault injection testing distributed systems and i have been doing some reading around the s. Jun 18, 2003 an incremental recovery cache supporting software fault tolerance, in reliable software technologiesadaeurope99, santander, spain, june 711, 1999, lecture notes in computer science 1622, pp. Design pattern representation for safetycritical embedded systems. A tutorial, 2000 nasa report, available online the ideas of masking redundancy, standby redundancy, and selfchecking design have been shown to be applicable to software, leading to various types of faulttolerant software flaw tolerance is a better term. Fault tolerance also resolves potential service interruptions related to software or logic errors. Soa enables faster integration of existing software components from different parties, makes fault tolerance ft feasible, and is also one of the fundamentals of cloud computing. Proceedings of the 23rd international conference on machine. The essence of this book is the presentation of the software fault tolerance techniques themselves. Software fault tolerance guide books acm digital library.
How are mission critical systems designed to handle system. The book is intended for practitioners and researchers who are concerned with the dependability of software systems. This chapter concentrates on software fault tolerance based on design diversity. The author uses the scientific method to deduce specific behavior and to target, analyze, extract and modify specific operations of a program for interoperability purposes. The adoption of software fault tolerance techniques based on design diversity has been advocated as a means of coping with residual software design faults in operational software lee and anderson 1990. Optimal fault tolerance strategy selection for web services. Design for reliability is achieved by faulttolerance techniques that keep the system working in the presence of software faults.
An initial specification of the intended functionality of the software is developed. International journal of web services research ijwsr 74. Design diversity is the provision of software components. We conducted a major experiment to engage 34 programming teams to independently develop multiple software versions for an industryscale critical flight application, and collected faults. Architectural issues in software fault tolerance j. From inside the book what people are saying write a. His research interests include software reliability engineering, distributed systems, fault tolerant computing, and machine learning. Software fault tolerance techniques involve error detection. Design diverse software fault tolerance techniques 5. Design fault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. An empirical study on testing and fault tolerance for. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to. Software fault tolerance is an immature area of research. Software fault tolerance is editted by by michael r.
Design for reliability is achieved by fault tolerance techniques that keep the system working in the presence of software faults. Introduction to reverse engineering software by mike perry, nasko oskov uiuc an introduction to reverse engineering software under both linux and windows. He is a fellow of the acm, the ieee, and the aaas, and a croucher senior research fellow for his contributions to software reliability engineering and software fault tolerance. He is now a professor at the chinese university of hong kong in shatin, hong kong. Lyu and published by john wiley and sons 1995, isbn 0471950688,425 pages, paperback. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to be able to solve the design fault problem. A unified view on learning with labeled and unlabeled data. Software fault tolerance carnegie mellon university. Optimal fault tolerance strategy selection for web. Fault prevention and fault tolerance techniques are leveraged in the development of large and reliable complex software systems.
Zheng z and lyu m 20 personalized reliability prediction of web services, acm transactions on software engineering and methodology. Design, testing, and evaluation techniques for software. Iyer and inhwan lee university of illinois at urbanachampaign abstract. Rogers p and wellings a the application of compiletime reflection to software fault tolerance using ada 95 proceedings of the 10th adaeurope international conference on reliable software technologies, 236.
The study 29 shows that system and applications software can potentially detect and correct some or many of these errors by using different software fault tolerance approaches such as replication, voting, and masking with a focus on algorithmbased faulttolerance 7, 31,32,33,34,35,37 or by using a combined software and hardware approaches. The approaches to reliable software systems include fault prevention e. Fault tolerant software systems using software configurations. Avizienis, the methodology of nversion programming, in software fault tolerance, m. Reliability and fault correlation are two main concerns for design diversity, yet empirical data are limited in investigating these two. We conducted a major experiment to engage 34 programming teams to independently develop multiple software versions for an industryscale critical flight. Avizienis, the methodology of nversion programming. In this paper, a distributed fault tolerance strategy evaluation and selection framework is proposed based on versatile fault tolerance techniques. Lyu, 1995, in the fastgrowing field of service computing, systematic and comprehensive studies on software fault tolerance techniques to transactional web services are still. Designing faulttolerant soa based on design diversity. Single version techniques focus on improving the fault tolerance of a. Software fault tolerance how is software fault tolerance abbreviated.
Software engineering for internet applications by eve andersson, philip greenspun, andrew grumet the mit press after completing this course on serverbased internet applications software, students who start with only the knowledge of how to write and debug a computer program will have learned how to build webbased applications on the scale of. The need of software fault tolerance provisions located in the application layer is supported by studies that showed that the majority of failures experienced by nowadays computer systems are due. Software fault tolerance free computer, programming. The complete text of software fault tolerance, written by michael r. Alzahrani n and petriu d modeling fault tolerance tactics with reusable aspects proceedings of the 11th international acm sigsoft conference on quality of software architectures, 4352 martin l, koziolek a and reussner r qualityoriented decision support for maintaining architectures of faulttolerant space systems proceedings of the 2015. Current methods for software fault tolerance include recovery blocks. Fault tolerant software has the ability to satisfy requirements despite failures. In previous work, we conducted a software project with realworld application for investigation on software testing and fault tolerance for design diversity. The study 29 shows that system and applications software can potentially detect and correct some or many of these errors by using different software fault tolerance approaches such as replication, voting, and masking with a focus on algorithmbased fault tolerance 7, 31,32,33,34,35,37 or by using a combined software and hardware approaches. Software fault tolerance in computer operating systems. Data diverse software fault tolerance techniques 6. Abstract fault tolerance is the survival attribute of a system or component to continue operating as required despite the manifestation of. The following software fault avoidance rules, as suggested by lyu, should be followed regardless of the type of installed software structure. Nversion programming nvp, also known as multiversion programming or multipleversion dissimilar software, is a method or process in software engineering where multiple functionally equivalent programs are independently generated from the same initial specifications.
Software fault tolerance professur fur systems engineering. Online shopping from a great selection at books store. According to software reliability engineering, the main approaches to build reliable software systems are 1 fault forecasting 6, 7, 2 fault prevention, 3 fault removal and 4 fault tolerance. According to lyu 95, software is a systematic representation and processing of human knowledge.
254 854 758 1606 675 471 695 179 1286 1492 610 69 1383 963 1026 847 261 1572 668 899 1158 1320 15 1033 1313 1459 228 282 1518 1592 1248 712 429 1322 1144 1199 763 731 89 36 623 243 67