Continuous Risk Management
Revision r1.4 - 03 May 2004 - 14:33 GMT - SomikRahaDescription
The NASA rover Spirit developed problems soon after landing on Mars. It seemed to be draining itself of power and rebooting continually. It turned out that the system was designed to reboot if it detected a serious anomaly and start fresh. However, the serious anomaly was that the Flash memory on board was full. That sent the rover into continuous reboots.
Luckily for this project, the team had done some RiskMitigation?. They had built in a window of time (almost an hour) before the reboot, so they could have some time to communicate with the rover. This allowed them to tell the rover to stop using the FLASH memory and use the onboard RAM temporarily, allowing it to stay up longer for further commands. The rover was then sent commands to clean up the FLASH memory.
As we know now, the rover is a success story. Yet, it came dangerously close to failure, if it were not for the two risk mitigation steps that had been taken on this project:
- Allowing a window of time before rebooting
- Having fallback RAM if the FLASH memory stopped working for some reason
To see the rovers in action, visit http://marsrovers.jpl.nasa.gov/home/
-- SomikRaha - 26 Apr 2004 |
|
|