Disaster recovery involves the establishment and maintenance of plans and procedures that can be used to help an enterprise recover IT functionality after a loss of data due to natural disaster or more commonly, extensive power failure at a local level. Fault tolerance, while sometimes linked with the concept of disaster recovery, is actually a distinct concept that enterprises need to keep in mind as they construct IT support and service plans.
What is Fault Tolerance?
Fault tolerance refers to an enterprise's ability to continue IT workflow without disruption even when key hardware components fail. The key to adequate fault tolerance is for IT support personnel to have sufficient replacement parts on hand so that employees are disrupted as little as possible by issues such as faulty network interface cards or Ethernet cables.
"Doubling Up" in System Design
In some cases, this duplication of components can even be built into the system from the first so that should a component happen to fail, the system switches operations automatically to the parallel component. In this situation, users may not even be aware of the switch since from their point of view, work could continue without any interruption at all. Components that can be "doubled up" in this way include Ethernet switches, host post adaptors, and sometimes even network interface cards.
Systems that currently have little or no fault tolerance built in are vulnerable and may cause return on IT investment to fall as compared to those that seamlessly support workflow. One way to improve your enterprise's IT fault tolerance is to contract with an IT services provider who can handle the task through project work.