Lions (1996): Ariane 5 Flight 501 Failure
On 4 June 1996, the inaugural flight of Ariane 5 ended 39 seconds after liftoff when the rocket veered off course and broke apart under aerodynamic loads. The self-destruct system activated at H0+39.4 seconds. The payload — four Cluster plasma physics satellites — was lost. The ESA inquiry board, chaired by mathematician Prof. J.-L. Lions, delivered its report 46 days later. That report became the most cited document in software engineering failure analysis.
The proximate cause was a 64-bit floating-point to 16-bit signed integer conversion overflow in the Inertial Reference System (SRI). The deeper cause was that tested, flight-proven software from Ariane 4 was reused in Ariane 5 without revalidating the assumptions that made it safe.
The Inertial Reference System
Section titled “The Inertial Reference System”The SRI (Systeme de Reference Inertielle) provides attitude and navigation data to the On-Board Computer (OBC), which commands the engine nozzle actuators. Ariane 5 carried two SRI units in an active/standby redundancy configuration. Both units ran identical software.
The SRI software included a function that computed a variable called BH — the Horizontal Bias — derived from platform alignment data and measured horizontal velocity. This function served a specific purpose: during the countdown, if the launch was aborted after the alignment sequence, BH allowed the alignment to be restarted quickly without repeating the full procedure.
After liftoff, BH served no purpose. The alignment function would never be needed again. But the software specification required that the function continue running for 40 seconds into flight, as a “convenience” in case of a very late hold and countdown recycle.
The Overflow
Section titled “The Overflow”During the first 30 seconds of flight, Ariane 5’s trajectory diverges significantly from Ariane 4’s. Ariane 5 develops substantially higher horizontal velocity earlier in the flight. The BH computation, which took horizontal velocity as an input, produced values that grew well beyond what Ariane 4 would ever have generated.
At approximately H0+36.7 seconds, the BH value exceeded , the maximum for a signed 16-bit integer. The Ada runtime detected the overflow during the 64-bit to 16-bit conversion and raised an Operand Error exception. The exception handler, following its specification, declared the processor failed and shut down the SRI.
The chain from overflow to destruction took roughly 3 seconds:
| Time | Event |
|---|---|
| H0+36.7s | BH overflows on the backup SRI; processor shuts down |
| H0+36.7s + 72ms | BH overflows on the active SRI; processor shuts down |
| H0+37s | Active SRI transmits diagnostic data (its last computed values) to the OBC |
| H0+37s | OBC interprets the diagnostic bit pattern as valid flight attitude data |
| H0+37s | OBC commands full nozzle deflection to “correct” the perceived attitude error |
| H0+39s | Aerodynamic loads exceed structural limits; vehicle breaks up |
| H0+39.4s | Self-destruct system activates (range safety) |
The Seven Variables
Section titled “The Seven Variables”The inquiry board found that the SRI software contained seven variables that performed conversions from 64-bit floating-point to 16-bit signed integer. During Ariane 4 development, engineers had analyzed which of these variables could potentially exceed the 16-bit range:
- 4 variables were protected with explicit range checks, because analysis showed they could overflow under Ariane 4’s flight conditions
- 3 variables were left unprotected, because analysis concluded their values would remain within range for Ariane 4
The BH variable was one of the three unprotected conversions. The analysis that justified omitting the protection was correct — for Ariane 4. No one repeated the analysis for Ariane 5’s different trajectory.
The Data Path Failure
Section titled “The Data Path Failure”When the active SRI shut down, it transmitted its last-computed values to the OBC as diagnostic data. The interface between SRI and OBC had no mechanism to distinguish diagnostic output from valid navigation data. The OBC received a bit pattern that, interpreted as attitude data, indicated the rocket was rotating rapidly. It commanded maximum nozzle deflection to compensate.
The real attitude was nominal. The nozzle deflection induced rapid pitch and yaw. The boosters and main engine could not maintain structural integrity under the resulting aerodynamic loads.
This is a separate failure from the overflow itself. Even after the SRI failed, the mission could have survived if:
- The OBC had performed a reasonableness check on the incoming data (the indicated rotation rate was physically implausible)
- The SRI had flagged its output as diagnostic rather than operational
- The OBC had compared data from both SRIs before acting (it received conflicting information and used the active unit’s data without question)
Why It Was Not Caught
Section titled “Why It Was Not Caught”The report identifies specific process failures, not individual errors:
Testing was component-level, not system-level. The SRI had been tested exhaustively — with Ariane 4’s trajectory profile. A trajectory simulation for Ariane 5 existed at the system level but had never been connected to the SRI test bench. The SRI was tested as a validated component, not as a component in a new environment.
Reuse was treated as equivalence. The SRI software had flown successfully on Ariane 4. This track record was taken as evidence of correctness for Ariane 5, without examining whether the operational assumptions still held. The report states: “the assumption that the SRI software was correct because it had flown on Ariane 4 was a major factor in the failure.”
The alignment function was not needed. The BH computation that destroyed the rocket served no operational purpose during flight. It was kept running as a convenience feature for countdown recycle scenarios. No risk analysis was performed on the cost of running unnecessary code in a safety-critical path.
Exception handling treated all errors as hardware failures. The SRI’s exception handler had one response to any software exception: shut down. This made sense for hardware faults (a corrupted memory cell, a timing failure) where continued operation could produce dangerously wrong output. It did not make sense for a software exception in a non-critical function, where the correct response would have been to disable the failed computation and continue providing attitude data from the remaining functional sensors.
The Recommendations
Section titled “The Recommendations”The inquiry board issued 14 recommendations. The most significant, generalized beyond Ariane:
On Software Reuse
Section titled “On Software Reuse”Do not reuse software unless the new operational environment has been fully analyzed and all assumptions underlying the original design have been revalidated.
The board recommended that ALL Ariane 4 software reused in Ariane 5 be re-examined with respect to the actual Ariane 5 flight environment. This is not a recommendation to avoid reuse — it is a recommendation to treat reuse as a design activity, not a logistics activity.
On Exception Handling
Section titled “On Exception Handling”Do not shut down a processor on a software exception unless it can be demonstrated that continued operation is more dangerous than shutdown.
The BH overflow was in a function that could have been disabled without affecting navigation. The correct response was to stop computing BH and continue providing attitude data. The blanket “shut down on any exception” policy made sense for hardware failures but converted a non-critical software error into a mission-critical failure.
On Unnecessary Functions
Section titled “On Unnecessary Functions”Do not run unnecessary software in flight-critical systems.
The alignment function served no purpose after liftoff. It was running because no one had decided to stop it. The board’s recommendation is straightforward: if a function is not needed for the current mission phase, disable it.
On Interface Design
Section titled “On Interface Design”All data exchanged between units must include validity flags and reasonableness checks.
The OBC accepted SRI diagnostic data as flight data because the interface did not distinguish between them. A single validity bit, or a range check on the received attitude rate, would have prevented the OBC from acting on nonsense.
What This Report Teaches
Section titled “What This Report Teaches”The Ariane 5 failure is not a story about a programming bug. The code worked as specified. The Ada runtime caught the overflow. The exception handler followed its design. Every component performed correctly in isolation.
The failure is a story about assumptions. The assumption that BH would fit in 16 bits. The assumption that Ariane 4 validation covered Ariane 5. The assumption that redundancy protects against software faults. The assumption that any exception means hardware failure. Each assumption was reasonable in its original context. None were re-examined when the context changed.
For any system that reuses validated software in a new environment — including satellite tracking systems that reuse orbital mechanics libraries, coordinate frame transformations, or hardware interface code — the Lions report provides a checklist: What are the implicit assumptions? Under what conditions were they validated? Do those conditions still hold?