In Reliability Engineering, when a Single system LRU (line replaceable Unit) / Component fails frequently, we don’t say: let us remove it completely and not use it within the design anymore.
We rather apply what is called redundancy, we can use double redundancy so that if the first component/LRU fails, the second component Installed as a standby kicks in (This is sometimes termed as ‘hot redundancy’), it is always engaged in some sort of hibernation mode, waiting for the active component to fail before the standby component can continue service without interruption.
Some systems go as far as having triple redundancy and quadruple redundancy. This is also used in communication networks.
While redundancy is important in meeting the RAMS (Reliability, Availability, Maintainability and Safety) requirements of the customer, systems and design engineers must guard against overdoing it, it should be appropriate so that the cost of engineering a system does not become unbearable hence unaffordable in cost terms.
But even in a redundant system, we must always look out for what is referred to as SPOF (Single Points of Failure), questions like what can occur and lead to the dysfunctionality or complete failure of the whole system must be asked, this, we also must find a solution for it through redundancy, preventive maintenance, predictive maintenance such as online monitoring and, or scheduled maintenance which may be termed corrective where components are replaced after a specified period of service before their complete failure.
Instead of this being called MTBF (Mean Time Between Failures), it could also be referred to as MTBSAF (Mean time between service affecting Failure)
SPOF are normally tracked in a RAM Log document and remedy to their failure stated in order to ensure the RAM requirements of a system under consideration are met. This process of tracking service affecting failures must continue until customer system reliability, availability and maintainability (RAM) originally set requirements are totally met.
While this is a mechanical, reliability, and general engineering Principle mostly used in the Space, Aircraft, and Railway industries, it can also be applied to other disciplines like human resources management, project management and more.
Am a Mechanical Engineer, and Architect of the African Railway Triangle Network Master Plan (ARTNMP), a Project we are working on with the African Continental Free Trade Area (AfCFTA) Secretariat and African Union to actualise.