Future Directions in Flight Software Assurance
Aerospace is helping mission planners understand and manage the risks associated with an increased reliance on sophisticated onboard software.
Engineers examine a satellite architecture using the Aerospace AADL-based reliability prediction tool.
In 1999, the Mars Climate Orbiter was lost as a result of a software error in its navigation system. A few months later, the Mars Polar Lander crashed because software prematurely shut down its descent engines. In 2005, the Huygens Probe lost half of its image data because of a defect in a single line of code.
Software accounts for a relatively small fraction of total satellite cost, but as these instances show, it can have a disproportionate effect on mission success. In fact, an Aerospace study of operational failures showed that, in the period from 1998 to 2000, nearly half of all observed space vehicle anomalies were related to software. A rigorous and consistent software assurance approach is clearly needed—but with current systems employing up to a million lines of code, the prospect of verifying and validating system software in a timely fashion poses a significant challenge. As part of its overall focus on mission assurance, Aerospace has embarked upon numerous research efforts to develop new software analysis techniques and to apply the latest developments to critical space programs.
Software assurance is different from hardware assurance. Traditional hardware assurance focuses on when a component will wear out or break. Software assurance considers the entire system life cycle, from concept definition (which determines how much software is needed and how complex it will be) through testing (which ensures that system-level interactions—as implemented in software—will function as designed). Similarly, software assurance can span all levels of system abstraction. For example, hardware redundancy can generate a need for software-driven failure detection and component switchover, and many hardware components also include firmware (software logic fabricated into the component) or other programmable electronics. Flight software must interface with that hardware correctly, and interactions among software components also have to work. Software design usually includes several layers of abstraction, with corresponding interface definitions.
Various factors influence how well software systems can be assured. Data-bus architectures, for example, are a significant contributor to system assurability. Although the data bus is a low-level component when viewed from a hardware perspective, it carries and synchronizes data from all major spacecraft subsystems. Most spacecraft use the MIL-STD-1553 data bus, but this has a relatively low throughput. Aerospace is analyzing newer techniques, such as time-triggered bus architectures and even Ethernet variants, for assurability and practicality.
The choice of programming language also has a significant effect on software assurance. Different languages have characteristics, such as type safety, that can make it easier or harder to introduce and catch errors in coding. Some languages allow programmers to specify the preconditions and required outcomes of a function. Automated checkers can then determine whether the function will produce the required outcome based on the given preconditions. SPARK (based on Ada) and Java (combined with the Java Modeling Language) are examples of languages that allow this feature. These languages have not yet seen widespread use in the space industry, however, so Aerospace is investigating their applicability.
Space-rated processors lag well behind commercial processors in computational performance. Determining the worst-case execution time of a software process—and thus whether it can be guaranteed to complete within its allotted time—is extremely difficult. A large part of the problem is uncertainty in the timing characteristics of the processor on which the software runs. Memory consistency is also a difficult problem; most processors execute multiple software tasks, and in critical systems, it is imperative that critical tasks (such as fault management) are protected from potential failures in other tasks.
Determining execution time used to be a simple calculation: multiply the number of instructions by each instruction’s cycle time and processor clock rate. Commercial processor architectures long ago abandoned this simple execution model, and recent space-rated processor designs, such as the RHPPC 603e and the RAD-750 PowerPC, have incorporated some of their more advanced techniques. These include out-of-order execution of instructions to work around data dependencies; caching instructions and data read from memory; and branch prediction, which is predicting the outcome of a control flow branch and speculatively issuing instructions. Calculation of flight software timing characteristics must now factor differing memory speeds, e.g., off-chip memory vs. on-chip cache memory. These new processor enhancements also introduce new flight software assurance issues, such as ensuring consistency between off-chip and on-chip data.
Simulation and Modeling
To evaluate software execution parameters, functional simulation is generally preferred because it gives quick answers; however, it does not capture the complex dynamics of the flight software as it executes. Cycle-accurate simulation—which simulates the exact internal behavior of a microprocessor, cycle for cycle—is used for detailed studies requiring high fidelity; however, this comes at a significant performance cost. Aerospace is working to develop a cycle-accurate simulation system within a realistic vehicle environment. Researchers are creating an enhanced tool suite based on a unified simulation environment to support assessment and diagnosis of real-time flight software execution on modern space processors. The tool would also serve as a flexible concept design environment for new space-based applications and systems. The effort also includes parallelization of the simulation algorithms to avoid the performance problems of cycle-accurate simulation, making it competitive with functional simulation as a timely analysis technique.
Testing and simulation can give strong assurance that some particular case with some particular set of inputs will execute correctly—but these techniques can cover only a small fraction of the software’s vast range of possible inputs. Modeling is therefore a necessary complement to testing and simulation. A model is an abstraction of the software system created by means of a discrete modeling language. It can be used to check whether the critical properties of the software can be expected to hold true. Formal models have defined semantics that enable this analysis, which is usually done by a software program.
Because it works on an abstraction of the software, such a model checker would not, for example, identify memory conflicts between different processes on the same processor (as a cycle-accurate simulation would). It can, however, verify high-level properties—such as the absence of potential for deadlock between processes—that would be difficult to prove through testing. Aerospace has a technology transfer effort underway to use temporal-logic model checking to model and analyze the reachable state space of a program (or set of interacting programs). The goal is to find the most effective way to model spacecraft fault-management software, to capture the critical characteristics in the abstraction, and to define the right properties for software assurance. Aerospace is also working to complete a flight-software reference architecture. Reference architectures are used to define the characteristics of a software product line and to show how different elements of that product line can vary. Some spacecraft are actually built as product lines, but even if they are not, they still exhibit a high degree of commonality, so a reference architecture can support spacecraft analysis across programs. This is particularly true at a high level, where many of the same system requirements are used over and over.
The spacecraft reference architecture is used to help capture and analyze requirements in the early phases of development. The architecture itself is represented as a set of generic requirements imposed on a spacecraft. These can be at different levels of abstraction. For example, a requirement for stationkeeping might have refined—but generic—counterparts for low Earth orbit, geosynchronous orbit, and highly elliptical orbit. Relationships among these requirements can be captured as generic, analyzable properties.
The generic requirements from the reference architecture can be tailored for inclusion in a request for proposal for spacecraft procurement. This will help ensure completeness of the requirements included and make such requests more consistent across programs. This would make it easier to exploit Aerospace’s formal analysis capabilities. Furthermore, when the contractor requirements are being evaluated, Aerospace can assess whether the contractor’s instantiation of the generic requirements still satisfies all of the relationships that the reference architecture defines among those components.
When a spacecraft hardware component fails, the failure is generally permanent. With software, however, a failure need not be permanent or unfixable. Simply retrying a function, or restarting a process, can be used to recover from a failure if system conditions (e.g., specific sensor inputs or interprocess timing) change between attempts.
Complex software applications actually have an expected, quantifiable failure rate, but spacecraft have built-in capabilities—such as fault management, anomaly detection and resolution, and safing—to deal with these problems. Traditional reliability analysis does not account for software failures, which means that predictions of spacecraft reliability are overly optimistic.
Accounting for software failure and recovery in system reliability is significantly more difficult than addressing hardware alone. For example, in a hardware-only analysis of a system consisting of two redundant channels, the reliability is determined by the reliabilities of the two channels and the switchover mechanism. When software failure and recovery is accounted for, the analysis must also include the software failure rate, the probability of software recovery, and the time necessary to execute the recovery. The mathematics behind the assessment are more complex and generally require the use of analytical tools. In addition, the models used to properly represent such redundancy require significant time and effort to create and verify.
Aerospace is working to develop tools to more quickly and accurately create models of combined hardware/software systems by integrating the reliability assessment with system architectural modeling. This effort draws upon the Architecture Analysis and Design Language (AADL), a system-modeling language that covers both hardware components and software functions as well as their system-level interactions.
AADL includes an error annex, so that failure behavior of architectural components can be specified within the same language (and tool) as the architecture, significantly reducing the difficulty in producing valid failure and repair models. Aerospace has implemented an extension that enables the error annexes to be linked to a tool that can solve the underlying mathematics. This will enable programs to build and analyze reliability properties of software systems more quickly and accurately than in the past, paving the way to incorporation of software considerations into system reliability analyses.
Another challenge in flight software assurance is testing the software in a realistic environment. Differences between test and operational environments can mean that critical defects are missed during testing. Aerospace encourages flight software testing to be as realistic as possible, as early as possible. Methods used by Aerospace include software-in-the-loop testing (where the actual flight software is used in a test setup with simulated hardware) and hardware-in-the-loop testing (where actual flight hardware is used). Each has limitations. Software-in-the-loop testing cannot verify performance requirements and does not enable testing of complex timing characteristics of flight software. A hardware-in-the-loop testbed costs as much to construct as the operational unit (which can easily exceed a million dollars).
Aerospace has introduced a third method, called software-emulated hardware-in-the-loop, that offers most of the benefit of hardware-in-the-loop testing but without the prohibitive cost. In this method, all satellite systems are emulated through software that mimics the functionality, interfaces, and performance of flight hardware, therefore providing a test system that is functionally equivalent to a hardware testbed. Once a software-emulated testbed has been developed, it can be replicated on demand at effectively no additional cost. Emulating hardware also addresses the obsolescence issues faced by many programs, because the testing can be performed even if flight hardware is no longer available.
Aerospace has developed a software-emulated hardware-in-the-loop framework for I/O (input/output) components and data buses. The core of the system was implemented as a flexible discrete-event simulation server with a modular interface to back-end processes. The system has been demonstrated using the real flight software from a government space program, with 21 emulated I/O components and four data buses. Aerospace is using the emulated system to conduct independent testing for the program.
Future efforts will involve emulation of the actual flight processor. In the near term, this effort will aim for a higher level of abstraction than the cycle-accurate simulation effort, so the more widespread (and faster) functional simulation will be used. Aerospace hopes to combine software-emulated hardware-in-the-loop testing with cycle-accurate simulators for a flight-hardware emulation of even higher fidelity.
Functional emulation in an effort this broad still poses difficulties. The compiled flight code must be linked with a board support package (providing libraries specific to the processor being emulated) and compiler executive (a software component similar to an operating system) and distilled into a binary image for emulation. The startup software for the processor, which resides in a separate memory location, must then configure the interrupt vectors of the interfaces, initialize the hardware, and load the binary image into the initialized processor memory. All of these steps are necessary before emulation of the flight software can even begin. Aerospace is working to develop this capability in a flexible way so that it can be used efficiently across programs for comprehensive independent testing.
Elements adapted from the Goal Structuring Notation for a software assurance case. Arguments or goals are represented by rectangles. Black arrows show the argument decomposition, red arrows show context. This version explicitly defines the strategy used for decomposition. Strategies aren’t required if the decomposition is obvious.
While spacecraft designs have traditionally achieved high reliability through hardware redundancy, the desire to reduce power consumption, mass, and cost while increasing spacecraft capabilities has shifted more of the burden of ensuring reliable operations onto the flight software. Modern spacecraft designs implement the bulk of the control system’s functionality in software, so flight software engineers have to understand the spacecraft’s overall requirements and design to ensure that the software will meet mission needs.
This is especially true for functions that must be highly dependable, such as the safing system. Most spacecraft have such a system, or at the least a mode of operation, to keep the vehicle powered and thermally safe in the event of an anomaly. Safing gives ground operators time to understand a problem and develop a solution without having to worry about vehicle loss. Safing systems are usually complex, and might use some of the same hardware and software used in normal operation—so it is important to understand why something that causes the primary system to fail will not also cause the safing system to fail. This means that the interactions among different hardware and software components used in safing need to be understood, and the evidence that the components and their interactions have been verified needs to be documented to provide confidence that all the risks to the vehicle have been identified and mitigated.
Assurance cases are increasingly being used for performing this kind of analysis. “Case” in this sense is like a case in the legal world: a documented argument that a system will meet its mission assurance requirements. In the past, Aerospace work on assurance cases focused on their ability to represent an assurance argument clearly and concisely, exposing deficiencies that might reveal risks and enabling the arguments to be reviewed independently. Graphical notations were used for diagramming the arguments and formalizing their syntax. More recent work has looked at creating patterns that can be used across different arguments, to help in making stronger, reusable arguments. Existing patterns focus on general software arguments and do not take advantage of the similarity of requirements and architectures across different spacecraft—the same similarity that is being captured in the reference architecture.
Aerospace has produced a set of patterns that are specific to safe-mode arguments, with an emphasis on functionality that is implemented in flight software. The patterns capture, for instance, the need to show that safe-mode attitude control can be maintained even if normal attitude control has failed. Creating patterns for assurance also enables lessons learned to be disseminated across different programs—for example, by requiring the arguments to show how critical lessons have been addressed.
Aerospace’s assurance case work is complementary to its reference architecture work. The requirements in the reference architecture capture what properties need to hold, whereas the assurance case captures why they need to hold and why specific ones were included or excluded. Assurance cases are not a formal analysis technique, so argument assertions cannot be checked mechanically; but those assertions are simple to write, and the case is easily reviewed by nonexperts. Each technique makes its own contribution to overall flight software assurance.
Synthesizing Different Analysis Capabilities
A high-level software assurance case pattern for system safe mode. The top-level goal, GTopLevel, is systematically decomposed into subgoals until specific evidence for those subgoals can be provided. The argument decomposition is indicated using black arrows, and context is included using red arrows. Arrows in the bottom rectangles indicate that the argument is continued in another diagram.
Bringing together all the different analyses needed to support flight software assurance is a challenge in itself. Some of this synthesis can be done using the assurance case. The individual analyses show that the claims in the argument (e.g., safe-mode software will control the satellite so that the solar arrays are illuminated) are supported by evidence (e.g., successful software-emulated hardware-in-the-loop testing). However, the intricacies of interactions among software and hardware, among software components, and among varying levels of abstraction, mean that it is difficult to argue that the individual analyses cover the problem space without synthesizing the analyses themselves.
The disadvantage of synthesizing two (or more) analyses is that it forces the analyses to be more general, so they might not be as well-suited to their individual tasks. Thus, sometimes it makes sense for analyses to be tightly coupled (as with AADL modeling and quantitative reliability assessment); other times, a rough correspondence is better (as with the reference architecture and assurance cases).
Aerospace has been working to integrate formal modeling of software through several lifecycle stages by means of a project known as the Software Lifecycle Analysis Environment. Flight software contractors use different commercial tools for their requirements, architecture, and design artifacts, and that makes Aerospace’s job a challenge: the contractors have to be free to use the tools that work best for them, but Aerospace has to be able to assess their artifacts regardless of the tools they use. The Software Lifecycle Analysis Environment toolset was designed and developed with vendor-independent open-source components. Aerospace can use it to analyze the links between the contractors’ different requirements, design, and test representations to gauge their completeness and consistency with respect to each other. The Software Lifecycle Analysis Environment is currently being upgraded using a new metamodel as the underlying representation. This will enable entire models to be captured and critical properties to be defined and analyzed using standard queries that exist in commercial and open-source tools.
Aerospace’s Modeling and Analysis for Real-Time Embedded Software (MARTES) center integrates different tools that are specifically focused on embedded software analysis and prototyping. This includes off-the-shelf real-time computing hardware and operating systems as well as software modeling, analysis, and development tools. One of the key areas supported by the center is the capability to perform early precode analysis of real-time embedded software architectures, including performance analysis and examination of concurrent software behavior. The MARTES center also provides facilities to explore and evaluate new software architecture, design, and development technologies for real-time embedded systems with a particular focus on spaceflight software.
Software assurance is a difficult problem because software absorbs the increasing complexity needed of modern space systems. Abstract modeling becomes important to deal with the complexity, but precise analysis remains necessary to ensure that abstractions do not hide subtle flaws that could lead to vehicle loss. Both these approaches require continual improvement because the size of the software to be assured grows along with its complexity.
Aerospace is focused on developing capabilities that are reusable across programs to enable strong assurance while reducing time and cost. Ideally, this will lead to analyses that consolidate existing capabilities rather than employ isolated, individual efforts. A continuing push for automated formal analysis will raise the level of confidence in the types of analyses already being conducted. The spectrum of analysis techniques will be tied together by the assurance case, helping to focus limited resources to achieve the greatest benefit. These strategies will together enable Aerospace to assure the software that will implement the needs of twenty-first century space systems.
The authors thank Sheri Benator, S.-L. Chow, Nehal Desai, Alex Ellis, William Greenwell, Scott Hendrickson, Robert Klungle, Alex Lam, Robert Luter, Tzvetan Metodi, Phil Schmidt, and David Wangerin for their contributions to this article.
Back to the Spring 2011 Table of Contents