Delivering full text access to the worlds highest quality technical literature in engineering and technology. This is achieved by creating fault tolerant composite services that leverage functionallyequivalent services. When a fault occurs, these techniques provide mechanisms to. Therefore faulttolerance is achieved by using diversity in the data space. The study of software faulttolerance is relatively new as compared with the study of faulttolerant hardware. Optimal fault tolerance strategy selection for web. Multiplecomputations are implemented by nfold n 2 2 replications in three domains. This survey includes classical work about design and data diversity for fault tolerance, as well as the cybersecurity literature that investigates randomization at different system levels. In this paper, a distributed fault tolerance strategy evaluation and selection framework is proposed based on versatile fault tolerance techniques.
Index termsdata diversity, design diversity, ncopy pro gramming, nversion programming, recovery blocks, retry blocks, software faults, software fault tolerance. Fault tolerance through automated diversity in the management of distributed systems jorg prei. Over recent years, software developers have been evaluating the benefits of both serviceoriented architecture and software fault tolerance techniques based on design diversity by. Data diverse software fault tolerance techniques n complements design diversity by compensating for design diversity s limitations n involves obtaining a related set of points in the program data space, executing the same software on those points in the program data space, and then using a decision algorithm to determine the resulting output. The intended audience is one of designers, assessors, and project managers with only. These design solutions provide an implementation framework to incorporate and validate the proposed rocbased checks. Citeseerx software fault tolerance by design diversity. It offers you a thorough understanding of the operation of critical software fault tolerance techniques and guides you through their design, operation and performance. Optimal fault tolerance strategy selection for web services. Designfault tolerance by means of design diversity is a concept that traces back to the very early age of informatics. The nversion approach to faulttolerant software ieee. The root cause of software design errors is the complexity of the systems.
Software fault tolerance techniques are employed during the procurement, or development, of the software. The design process should allocate requirements to software elements, identify available system resources with their limitations and selected managing strategies. Hardware implemented fault tolerance design reduces operating system size, minimises systems software and increases processing speed, offering the end user the safest and simplest design. They will gain a thorough understanding of fault tolerant computers, including both the theory of how to design and evaluate them and the practical knowledge of achieving faulttolerance in electronic, communication and software systems. The design diversity experiments testbed dedix has thus two aspects. In order to make measurements in a multi version software experiment, a testbed was needed. To handle faults gracefully, some computer systems have two or more. This chapter concentrates on software fault tolerance based on design diversity. Software faulttolerance dedix as an experimentation tool proam interface in multiple version software the versions of an application program are all written according to the same functional specification. The intended audience is one of designers, assessors, and project managers with only a.
Fault tolerance white papers faulttolerance, fault. Sc high integrity system university of applied sciences, frankfurt am main 2. For 94 example, this would mean adding different system as backup. Software reliability is the probability of failurefree software operation for a specified period of time in a specified environment. In previous work, we conducted a software project with realworld application for investigation on software testing and fault tolerance for design diversity. The term essentially refers to a systems ability to allow for failures or malfunctions, and this ability may be provided by software, hardware or a combination of both. The system can continue its operations at a reduced level rather than be failing completely. Software fault tolerance carnegie mellon university. It broadens this standard scope of diversity to include the study and exploitation of natural diversity and the management of diverse software products. System structure for software fault tolerance semantic scholar. Such systems focus strongly on design faults, where the term. Traditional method of software fault tolerance based on design diversity is expensive and hence does not get used.
Novell doesnt say whether sft is an abbreviation for something. Design diversity is a solution to software fault tolerance only so far as it is possible to create diverse and equivalent specifications so that programmers can create software which has different enough designs that they dont share similar failure modes. It would be very difficult to sum it up in one article since there are multiple ways to achieve fault tolerance in software. Software fault tolerance by design diversity cuhk cse. Review of software design diversity 1 introduction 2 n. Faulttolerant software assures system reliability by using protective redundancy at the software level. Both schemes are based on software redundancy assuming that the events of coincidental software failures are rare.
Software fault tolerance is the ability for software to detect and recover from a fault that is happening or has already happened in either the software or hardware in the system in which the software is running to provide service by the specification. Chen, on the implementation of nversion programming for software faulttolerance during program execution, proceedings compsac 77, chicago il, pp. A heuristic to improve robustness of selfadaptive cloud architectures. These principles deal with desktop, server applications andor soa. Software fault tolerance using data diversity attention. Coverage includes fault tolerance techniques through hardware, software, information and time redundancy. Study a specific software fault tolerance scheme middleware or application using software fault tolerance e. Architecture and software fault tolerant technology. Roberts, software faulttolerance in the pluribus, afips conference proceedings 1978 ncc 47, anaheim ca, pp. However, the motivation behind this solution was to manage service.
Byzantine fault tolerance here is having all loyal generals attack. Software fault tolerance cmuece carnegie mellon university. It differs from hardware reliability in that it reflects the design. In order to complement design diversity in the quest for fault tolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach. They will gain a thorough understanding of fault tolerant computers, including both the theory of how to design and evaluate them and the practical knowledge of achieving fault tolerance in electronic, communication and software systems. Fault tolerance is the ability to continue operating. Mutants were generated by injecting one single real fault recorded in the software development phase to the final versions. Professor lorenzo strigini, professor of systems engineering, is an academic. Approaches to software fault tolerance depend on software diversity where it is assumed that different implementations of the same software specification will fail in different ways. Each channel is designed to provide the same function, and a method is provided to identify if one channel deviates unacceptably from the others.
They include the recovery block scheme rbs programming, consensus recovery block programming, nversion programming nvp, n selfchecking programming nscp and data diversity. Professor lorenzo strigini city, university of london. Over recent years, software developers have been evaluating the benefits of both serviceoriented architecture soa and software fault tolerance techniques based on design diversity. Sft iii allows two servers to mirror each other so that one server is always available in case the other one fails. Software fault tolerance is the ability of computer software to continue its normal operation despite the presence of system or hardware faults. Fault tolerance is a concept used in many fields, but it is particularly important to data storage and information technology infrastructure. Jul 06, 2019 fault tolerance on a system is a feature that enables a system to continue with its operations even when there is a failure on one part of the system. Jun 17, 2019 fault tolerance is a concept used in many fields, but it is particularly important to data storage and information technology infrastructure. Software diversity in computerized control systems, springerverlag, wien 1988. One approach of software fault tolerance, also known as design diversity, is to employ functionally. In this feature, not only a software is run multiple times, but also each copy is written by a different engineering team. Fault tolerance is the property that enables a system to continue operating properly in the event of the failure of or one or more faults within some of its components.
Design diversity was not a concept applied to the solutions to hardware fault tolerance, and to this end, nway redundant systems solved many single errors by. The versions are used as alternatives with a separate means of. Software fault tolerance dedix as an experimentation tool proam interface in multiple version software the versions of an application program are all written according to the same functional specification. Therefore, it is reasonable to deal with the remaining software faults bugs during runtime to increase the overall reliability. Pdf design diversity has been used for many years now as a means of. It is assumed that implementations are a independent and b. Coverage includes faulttolerance techniques through hardware, software, information and time redundancy. Principal requirements for the implementation of nversion software are summarized and the dedix distributed supervisor and testbed for the execution of nversion software is described. In this paper, the authors apply software fault tolerance techniques for web services, where the component failures are handled by fault tolerance strategies. Sft iii is a feature providing fault tolerance in intelbased pc network server running novells netware operating system. We have several software fault tolerance schemes as proposed in 46,47,48,49,50 are based on software design diversity in order to tolerate software design bugs. Designing faulttolerant soa based on design diversity.
We aim to support the software architect in the design of faulttolerant compositions. Software fault tolerance techniques and implementation. Definition and analysis of hardware and softwarefault. Sft iii is a feature providing faulttolerance in intelbased pc network server running novells netware operating system. In contrast to hardwarefaults, no concepts or mechanisms for fault tolerance of general software faults became widely accepted. This is certainly more true of software systems than almost any phenomenon, not all software change in the same way so software fault tolerance methods are designed to overcome execution errors by modifying variable values to create an acceptable program state. In this context, fault tolerance refers to the ability of a computer system or storage subsystem to suffer failures in component hardware or software parts yet continue to function without a service interruption and without losing data or. Index termsdesign diversity, fault tolerance, multiple computa. Such techniques use datu diversity to tolerate residual faults.
Ammann the software productivity consortium, 2214 rock hill road, herndon, virginia 22070, usa a bstra ct typical software fault tolerance techniques are modeled on successful hardware fault tolerance techniques. Structuring redundancy for software fault tolerance robust software. Fault tolerant software architecture stack overflow. A basic requirement was to simulate the environments in which design diversity should be used. Design diversity is the provision of software components called. An approach called design diversity combines hardware and software faulttolerance by implementing a faulttolerant computer system using different hardware and software in redundant channels. System structure for software fault tolerance semantic. In contrast to hardwarefaults, no concepts or mechanisms for fault tolerance of general softwarefaults became widely accepted. Software fault tolerance during the development of software, it is infeasible to find all its bugs, which can reach as far back as the design phase. Byzantine fault tolerance wikipedia faulttolerant design wikipedia faulttolerance wikipedia.
In order to complement design diversity in the quest for faulttolerance software, there exits several data diversity techniques which are similar to the aforementioned for the design diversity approach. Designing faulttolerant soa based on design diversity springerlink. Software fault tolerance via environmental diversity. Software reliability is also an important factor affecting system reliability. Fault tolerance is the way in which an operating system os responds to a hardware or software failure. In this paper we present a new concept for the design of system management, that enables the tolerance of softwarefaults of the executed applications. Look to this innovative resource for the most comprehensive coverage of software fault tolerance techniques available in a single volume.
What is the importance of implementing a fault tolerance. Buy only what you need wide range of configurable, fault tolerant, multi function io modules to suit most applications. Despite more and more improvements in fault preventing techniques, it is a fact that faults remain in every complex software system. Therefore fault tolerance is achieved by using diversity in the data space. Fault tolerance through automated diversity in the management. Data diversity can also be applied to software testing and greatly facilitates the automation of testing. Software fault tolerance professur fur systems engineering. The multiple computation approach and its extension to design diversity multiple computation is a fundamental method employedto attain fault tolerance.
Compounding the problems in building correct software is the difficulty in assessing the correctness of software for highly complex systems. Before discussing the use of multiversion software in a faulttolerant system. An approach called design diversity combines hardware and software fault tolerance by implementing a fault tolerant computer system using different hardware and software in redundant channels. Fault tolerance through automated diversity in the.
Software fault tolerance software fault tolerance is the ability for software to detect and recover from a fault that is happening or has already happened in either the software or hardware in the system in which the software is running to provide service by the specification. Fault tolerant system dependabilityexplicit modeling of hardware and software componentinteractions. In this paper we present a new concept for the design of system management, that enables the tolerance of software faults of the executed applications. Primary module alternate module acceptance test design fault software fault.
This is achieved by creating faulttolerant composite services that leverage functionallyequivalent services. Software fault tolerance, edited by lyu c 1995 john. Data diverse software fault tolerance techniques n complements design diversity by compensating for design diversity s limitations n involves obtaining a related set of points in the program data space, executing the same software on those points in the program data. Data diversity relies on a different form of redundancy from existing approaches to software fault tolerance and is substantially less expensive to implement. Lorenzo striginis teaching includes, among other courses, the module on risk in sociotechnical systems for the professional msc in information security and risk, a oneweek course on introduction to faulttolerant computing for industry, and an undergraduate module on software reliability and measurement. If its operating quality decreases at all, the decrease is proportional to the severity of the failure, as compared to a naively designed system, in which even a small failure can cause total breakdown. Nov 14, 2014 over recent years, software developers have been evaluating the benefits of both serviceoriented architecture soa and software fault tolerance techniques based on design diversity. Serviceoriented systems are usually composed by heterogeneous web services, which are distributed across the internet and provided by organizations. Fault tolerance on a system is a feature that enables a system to continue with its operations even when there is a failure on one part of the system.
Recent developments in year 2000 and beyond benoit baudry 1 and martin monperrusy2 1inria, france 2university of lille, france abstract early experiments with software diversity in the mid 1970s investigated nversion programming and. Software engineering software fault tolerance javatpoint. Provide mechanisms for fault tolerance, like managing replicated executions and checking tasks executed successfully. As more and more complex systems get designed and built, especially safety critical systems, software fault tolerance and the next generation of hardware fault tolerance will need to evolve to be able to solve the design fault problem. Abstractnowadays the reliability of software is often the main goal in the software development process. Uses simpler hardware or software to reconcile outputs. A fundamental way of improving the reliability of software systems depends on the principle of design diversity where different versions of the functions are implemented. Citeseerx fault tolerance through automated diversity in. Schedule tasks among candidate processors which can handle timing requirements. Software fault tolerance is an immature area of research. There are two basic techniques for obtaining faulttolerant software. Assume we are working with duplicated processing modules like ima. Also there are multiple methodologies, few of which we already follow without knowing. Basic fault tolerant software techniques geeksforgeeks.
123 1292 163 795 576 19 264 876 1197 1500 342 99 964 1401 1290 1464 267 608 1290 821 951 442 71 28 289 388 263 1515 865 1223 716 1182 1306 1169 47 1016 734 1122 628 212 79 1464 38