Reliability Programs:
Successful or Failures?


Why do some reliability programs succeed?  Why do other reliability programs fail?  How do you make reliability programs successful?  Key elements characterize each case.  


Attributes of Successful Reliability Programs:

Successful reliability programs begin as top-down management-driven programs.  The programs improve operations and reduce costs (and the risk of costs from potential failures such as for safety issues). 


Successful reliability programs change the business culture to abhor failures which cause long-term chaos and extra costs.  Failure-free business programs have similar statements:
            SafetyWe will operate in an accident-free environment,
            Quality—We will ship defect-free products,

EnvironmentalWe will operate without spills or releases to the environment for failure-free results, and

ReliabilityWe will design and build an economical and failure-free manufacturing process that will operate for 5 years between planned turnarounds.

Failure-free cultures must be created, top down, to accept failure-free programs as the right thing to do to enhance long-term profits as a way of life for the organization.  Management communicates their desires in each of these areas with a policy statement for clarity and effectiveness. 


None of these failure-free programs are altruistic, “do-good” programs.  The programs are not eye washes for looking good but accepting chaos of failures.  Each program requires sophistication, education, training, and discipline in the management ranks to drive successful programs as ethical statements of how we plan to do business.   Discipline is used in the spirit of: to train, direct, and mold.  Discipline is not used in the spirit of: to beat up, intimidate, berate, belittle, or degrade.


The ethical, failure-free statements require insight and motivation for achieving excellent (failure-free) results by motivating the entire organization to eliminate failures, and the risk of failures, so as to operate for the lowest long-term cost of ownership.  Recognize that every failure has a cost consequence and alternatives.  Cost consequences apply internally to the organization and externally to the customers who purchase and use the products produced. 


The top-level management team must indoctrinate lower management levels in the organization to achieve their personal participation.  Indoctrination cannot be off-loaded/delegated to others in the organization for effectively transmitting the failure-free message.
            Safety Departments do not, on their own, control safety results.

            Quality Departments do not, on their own, control product quality.

            Environmental Departments do not, on their own, control environmental results.
            Reliability Departments do not, on their own, produce a failure free process.

The system depends upon management teamwork to achieve failure-free results.  Management is the teamwork coach to accomplish the desired failure-free results.


Each of the above departments control results in the same manner as the weatherman controls the weather!  Each of the departments provides knowledge, scorekeeping, and motivation for the organization to accomplish the collective results by individual participation and teamwork.  Favorable results are obtained by the team preventing failures at the formative stages.  Favorable results are not achieved by quickly cleaning up the blood and guts of failures.  Failure-free environments accomplish favorable results by doing the job right the first time at the lowest tradeoff cost.


Management values organizational results of the team to achieve failure-free environments.  Management recognizes that discrepancies (small failures) will occur at every step of the process, and skillful organizations expect helping hands to willingly take the initiative, without individual direction, for productive failure-free results rather than waiting for specific management instructions.  This means individuals must be empowered and enabled to take corrective action for good team effort where all work for the lowest long-term cost of ownership by abhorring failures.  Please note:
            Empowered means management authorizes individual initiative and experience to be used continuously in an effective and timely manner.  Management must invest individuals in the organization with authority to take action.

            Enabled means trained and drilled for proficiency using best practices that are continuously improved by feedback for the working teams.  Management must turn-on the organization for action rather than disabling and denying positive action. 


Empowered and enabled does not mean contributing a minimal effort to reach the lowest acceptable lowest proficiency levels!  To gain a historical and documented sense of empowerment and enablement, read the book To Rule The Waves: How the British Navy Shaped the Modern World by Arthur Herman, ISBN: 0-06-053424-9.  This book relates the history of why England ruled the seas for hundreds of years, as during battle both sailors and officers silently hauled lines and loaded weapons as well-drilled and knowledgeable teams, without frequent verbal instructions. 


The English Navy, when in harm’s way, fired two to three times more often with higher precision than their adversaries.  Why?  The English Navy drilled continuously every day—sailors and officers knew the details of many jobs.  England’s opponents had subordinate sailors waiting for detailed instructions from superior officers before taking actions in a deadly battle.  The opponent’s sailors were not given individual initiative of empowerment and enablement. 


Lack of individual initiative, training, and teamwork among England’s opponents resulted in destruction of thousands of sailors and officers in battle as the officer’s commands were slow to come (and often inaudible in battle).  Lack of empowerment and enablement resulted in death, destruction, and loss of ships—this is clearly a failure when under the gun. 


Captured French, Spanish, and Dutch officers, when brought aboard English ships, were amazed at how silently English sailing ships of the line operated under teamwork by empowered and enabled sailors and officers during battle!  The opponents’ ships had pandemonium for similar results at a slower pace (that’s the reason the English captured their opponents as they seized their ships and put the defeated officers directly under the gaze of English conquerors). 


Bottom line:  English Navy management (officers) gave up many prerogatives of directing the work force (sailors) to achieve superior results by being on the firing line during action.  The English Naval officers helped the team (no Us vs. Them situations existed in battle) to achieve superior results.  This meant officers were required to know and perform mundane details with great proficiency (e.g., show me don’t tell me!).  This provides a strong message for management today.  Management cannot learn detailed jobs, to helpfully guide the organization, in an environment of continual chaos of rotations and turnovers on 12 month to 2 year changes of command.  You need seasoned managers and not slightly seared managers to gain constancy of purpose and work for a failure-free organization to achieve good results the first time and every time.


Management gets what management wants.  This means management driven programs must be properly configured with roles and responsibilities clearly defined and expectations stated in writing to empower and enable the organization for failure-free results from the many invisible hands of the workforce taking positive corrective action.

Return to top


Attributes of Failed Reliability Programs:

Unsuccessful reliability programs begin as bottom-up engineering programs for improving maintenance technical details and hopefully making improvements for operations.  The push is for better maintenance technology without financial justifications to carry the project forward. 


Bottom up reliability improvement programs are frequently lukewarmly endorsed by management.  Management gives a wave of the hand and a sly grin followed by an aside to other managers as if to say “Here we go again with another gear-head approach to maintenance that will require us to spend more money for no results.”  In other asides, management says, “I wonder what new book/magazine these guys have been reading or which new consultant they have met?”  At this point, the ship has been torpedoed but not yet sunk as management is not leading—it’s subversively following.


Management says, “Yes, we believe in reliability programs but just get this equipment repaired faster!”  Management fails to observe a dangerous cross-communication as words and actions that do not match.  This cross-communication is quickly interpreted by the organization.  The bottom-up reliability program will be similar to other short-lived improvement programs that have been demonstrated to be ineffective.  The organization says, “Just wait a while and this new silliness will soon disappear (along with the initiators of the reliability program).  Then we can go back to doing business as usual without all of this difficult technical detail.”  In short, the organization has just endorsed a silent, subversive attack against advocates for changing the status quo.


Moving reliability programs upward as a technical method of improving maintenance, is as effective as pushing a wet rope.  Maybe up to 10% to 20% of the bottom-up approaches are successful.  Quantification of improvements is difficult to prove to the satisfaction of the skeptics, and management is suspicious of the claimed results as changes in the status quo are difficult to accept.  The organization says failures of equipment and processes are expected to occur, and besides, the organization rewards fast repairs over preventing failures.  And by the way, how would you show you’ve prevented failures?


The kiss of death for some reliability programs begin with acquisition of a newer and more complex computerized maintenance management system (CMMS).  The new CMMS requires a large capital expenditure and extensive training for everyone in the operations.  The bet is the new CMMS system will save our tail feathers.  Data from the old, “inferior” CMMS system is not converted to the new, improved system because the data is “no good” and unworthy of the conversion costs—this is a big mistake.  The previous failure history is judged as worthless because the reliability team has not used the data to solve problems because they were never adequately trained.  A red dot has now been painted on the head of the CMMS team so, in time, the organization will peck the team to death with a thousand small cuts.


Don’t get me wrong.  I am for use of data from CMMS systems.  I just don’t want to wait “forever” before acquiring perfect data.  We need to make the best of the existing information.   Continuous improvement is better than postponed perfection”!  If you’ve got the best CMMS system in the world and no one knows how to analyze the data, you’re lost---software won’t save you, it’s the training and skills of the people that will pull your chestnuts out of the fire!  Having Excel on computers will not be productive if you have not trained your people in how to use it, and it is most effectively used by people with both mathematical and engineering skills.


Add to this chaos lack of management knowledge of new tools for preventing failures.  Why prevent failures?  Its simple, failures cost money and we need to look for the most cost effective approach (sometimes this may simply mean repairing the failure)!  The management problem gets worse when managers reject training initiatives to increase their knowledge of improvements underway in similar organizations because management fails to read and study new and more cost effective strategies.  The chaos gets worse with every new management change because what was right before the management switch is now described as wrong because we’ve got to “change” from the old failed course (sing this refrain every 1-2 years in many organizations with revolving door managers).
Return to top


How to Make Non-Management Driven Reliability Programs Successful:

How do you create non-management-driven reliability programs that replicate succeses of management-driven reliability programs?  Here are some bullet points to consider for the reliability strike force to sell successful programs:

1.      Make the reliability program money driven (not technology driven).  Use reliability tools/technology to get to the money.  Sell the reliability program as all about the money and time (two favorite subjects of management)!  Remember that safety issues are also about the probability of failure multiplied by the cost consequences to get the money at risk.

2.      Build a $Pareto distribution of losses (also including the risks for potential losses), which is the sum of maintenance costs (usually small) and lost margin money (usually large in a sold out process or zero in a non-sold out process where make-up time for failures is available) from process failures.

3.      Forecast failures into the future for the next 3 to 5 years based on past 5 to10 years of experience (gather data only on the top ten items) using Crow-AMSAA reliability growth plots as a “show me, don’t just tell me the situation.”  Convert the forecasted failures into a forecast of money that will be lost.

4.      Build the top ten work list based on money.  Discuss the details with management.  Gain concurrence that these key problems need to be solved first before consideration of love affairs with equipment or processes, and then monetize the love affairs for priority along with the other economic issues.

5.      Based on the top ten work list, make a hypothesis about when cusps can be put on the Crow-AMSAA plots and how much gain can be achieved—state the results in time and money.  Project the savings achieved by an active reliability program to reduce/eliminate failures to the process/equipment.  Describe time/money with payback periods.  Get acceptance by management that these key issues need to be resolved and resources provided to produce the expected results.

6.      Show progress quickly—months, not years.  Sum the financial results to justify the reliability program in time/money.  If the reliability program is not paying its way, kill it and disband the reliability organization for lack of effectiveness.  If the program is succeeding, advertise the results to management and press on for more savings very quickly.

7.      In the top items of the Pareto distribution, use top-down fault tree analysis tools to work on recurring problems and in the critical legs of the fault tree perform failure mode and effects analysis to ferret out the roots of the problem and prevent failures from occurring using data from the CMMS system.  And if the CMMS is not adequate, argue why better data is needed for technical solutions to problems along with mandatory high fidelity conversions of old data into the new CMMS system to preserve historical failure details.

8.      Based on the financial successes, sell management on why a formal reliability program (coming down from management) is to their advantage to change the culture in the plant from failure acceptance to failure prevention.  Also, sell management why they should issue a reliability policy to communicate with the organization for achieving a failure-free process.

9.      Enlarge the reliability program to include process reliability for inclusion in the Pareto distributions for high ticket cost issues.

10.  Encourage separation of maintenance engineers (tactical resources) from reliability engineers (strategic resources) without increasing head counts for more cost-effective utilization of human assets.  Expecting engineers to do both jobs only results in the adrenalin driven maintenance engineering work.

11.  Introduce into new projects the use of reliability models for calculating availability, reliability, and the number of expected failures in projects with design goals for each case based on failure data acquired from operations data particularly from a Weibull database of failures and repairs.  Work for the lowest long term cost of ownership.

12.  Continue to justify and sell the reliability program as a portion of both the maintenance work process and the design work process to achieve first quartile operation with a small cost of unreliability compared to peer producers.

13.  Recognize the reliability program is about selling effective money driven improvement programs, not simply telling about technology.  Successful reliability programs are about saving money and improving operations (making favorable economic things happen)—they are not about ponderous bureaucratic efforts (sitting and delaying rather than doing).

Return to top

Refer to the caveats on the Problem of the Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by   clicking here.   Return to top of page.

You can download a copy of this page as a PDF file.

Return to Barringer & Associates, Inc. homepage

Last revised March 8, 2010
© Barringer & Associates, Inc. 2007-2008