Little r and Big R
(Where r and R = Reliability)

 

Reliability is in the vocabulary of almost everyone.  Ask an ordinary person to define reliability.  You’ll get a wide variety of non-responses ending in “I know what it is, I just can’t define it right now”. 

 

The Encarta® Dictionary from Microsoft defines reliability as:

1)    Dependable: able to be trusted to do what is expected or has been promised, and

2)    Likely to be accurate: able to be trusted to be accurate or correct or to provide a correct result. 

 

Here is my preferred definition of reliability particularly appropriate for the chemical and petroleum industries:

 

Reliability is the probability that a device, system, or process will perform its prescribed duty without failure for a given time when operated correctly in a specified environment.

 

Notice the emphasis on the word process which is the king in most production facilities and operations and without failure which are the key words.  The sweet side of the coin is reliability.  The sour side of the coin is a failure that terminates reliability.

 

Happiness is operating without failures.  You don’t get reliability by wishing, hoping, or waiting for miracles.  You can not repair yourself to happiness.  You achieve reliability by:

1)    Planning for reliability,

2)    Controlling for reliability, and

3)    Improving reliability.

No miracles in this effort—just hard work using the correct tools for reliability and using the tools correctly.  An example for correct use of tools is a chain saw.  A chain saw is a wonderful device for cutting wood—but only if you use it with engine operating—otherwise it’s a pain!

 

Dr. Joe Juran, a modern founder of the quality movement died on Feb 29, 2008 at age 103 after working up to the day before his death.  Juran promoted planning and prevention as a complement to control.  Juran asked for us to think with a large scope about quality.  Juran emphasized the difference between “little q” and “big Q” for quality (an parallels exist for reliability issues).

1)    Little q has to do with problems of production and the tactical tools that lead to control and improvement of quality

2)    Big Q relates to quality management issues because they are more comprehensive and system wide as pertains to strategic issues.

 

Of course both little q and big Q are complementary.  Every engineer should own and study a copy of Juran’s Quality Handbook, 5th edition, page 2.4, Consider Juran’s Table 2.1 where Big Q and Little Q for quality are contrasted:

Table 2.1: Contrast, Big Q and Little q (where Q and q refer to quality)

Topic

Content of little q

Content of big Q

Products

Manufactured goods

All products, goods, and services, whether for sale or not

Processes

Processes directly related to manufacture of goods

All process manufacturing support: business, etc.

Industries

Manufacturing

All industries, manufacturing, service, government, etc., whether for profit or not

Quality viewed as:

A technological problem

A business problem

Customer

Clients who buy the products

All who are affected, external and internal

How to think about quality

Based on culture of functional departments

Based on the universal trilogy [Juran’s trilogy involves planning, control, and improvement]

Quality goals are included:

Among factory goals

In company business plan

Cost of poor quality

Costs associated with deficient manufactured goods

All costs that would disappear if everything were perfect

Evaluation of quality is based mainly on:

Conformance to factory specifications, procedures, standards

Responsiveness to customer needs

Improvement is directed at:

Departmental performance

Company performance

Training in managing for quality is:

Concentrated in the quality department

Company wide

Coordination is by:

The quality managers

A quality council of upper managers

Source:  Juran’s Quality Handbook, 5th edition, McGraw-Hill, New York, 1998

              ISBN 0-07-034003-X

 


Parallels exist between quality and reliability.

          Quality is static. 

          Reliability is dynamic. 

Reliability’s dynamic of time makes it a more difficult subject to study, calculate, and explain. 

1)     Little r has to do with problems of production and the tactical tools that lead to control and improvement of reliability

2)    Big R relates to reliability management issues because they are more comprehensive and system wide as pertains to strategic issues.

Never the less, parallels exist between Little q and Big Q on one side and Little r and Big R which are described in the following table.

 

Barringer’s Little r and Big R Contrast  (Where r, R=Reliability)

Topic

Content of Little r

Content of Big R

Little r and Big R means

Little r refers to a narrow view of reliability involving lower level events/actions associated with things

Big R refers to a very broad view of reliability involving higher level and broader concepts

Distinction

Little r relates to product problems and tactical tools leading to reliability control and improvements

Big R is concerned with comprehensive and system wide concepts which are strategic in nature for reliability issues

Products

Generally consider for manufactured goods but also applies to the production process

All products, goods, and services, whether for sale or not

Processes

Processes directly related to manufacture of goods

All processes for manufacturing support: business, etc.

Industries

Usually consider as a manufacturing issue but includes design, construction, installation, etc.

All industries, manufacturing, service, government, banking, etc., whether for profit or not

Reliability is viewed as

A technological problem involved with failures

A business problem with cost and services

Customer is viewed as

Clients who buy or receive the products

All who are affected including external and internal customers

How to think about reliability

Based on culture of functional departments

Based on the universal trilogy for reliability [planning, control, improvement]

Reliability goals are included

Among factory goals and engineering specifications

Stated in company business plans and advertisements

Cost of poor reliability

Costs associated with deficient manufactured goods or processes

All costs that would disappear if everything were perfect as summarized by the cost of unreliability (COUR)

Evaluation of reliability is based mainly on

Conformance to factory specifications, procedures, and standards

Responsiveness to customer needs, expectations, and advertised statements

Improvement is directed at:

Departmental performance for reducing failures/costs

Company performance

Training in managing for reliability is

Concentrated in the reliability department

Company wide driven by a policy statement for reliability

Coordination is by

The reliability manager

A reliability council of upper managers

Program thrust

Tactical:  Many small issues drive procedures and numerous rules

Strategic:  Few large issues based on a reliability policy of intent for the organization

Audit thrust

Numerous check list and tools employed for control and improvement of reliability

Is management functionally performing against the reliability objectives and is the program being both financially and customer successful? as opposed to a procedural audit

Inspired from:  Juran’s Quality Handbook, 5th edition, McGraw-Hill,
                         New York
, 1998, ISBN 0-07-034003-X

 

How do we approach the program? Consider Juran’s universal trilogy which applies to Accounting, Banking, Engineering, Manufacturing, Quality, Reliability, etc.:

  1. Planning [see Juran’s section 3],
  2. Control [see Juran’s section 4]l, and
  3. Improvement [see Juran’s section 5].

Details for each element are provided below for reliability issues.

 

Planning-

Reliability planning is a structured process for developing products and processes that ensure customers needs and process needs are met by the final result which is devoid of failures.

 

Planning is required because we have gaps between what the customer/owner  needs/wants for reliability and what the product/process delivers for reliability.  The plan must also consider the price that fits the lowest long term cost of ownership.  Cheapest first cost is often misleading criteria as sustaining cost is usually 2-20 times greater than the acquisition cost.  Many sustaining costs are incurred as untended consequences of costly events never detailed upfront as part of the plan because owners/designers never play all the cards face-up for clarity and understanding.

 

The end-user/owner of the product/process must perceive the product/process gives them the beneficial results they expect:  The value must be perceived in the eye of the end-user/owner and not in the eye of the designer/manufacturer.

 

Key metrics must be spelled out clearly.  A good guide for planning is SAE’s Reliability and Maintainability Guideline for Manufacturing Machinery and Equipment, second edition, publication M-110.2, ISBN 0-7680-0473-X, 1999.  For example, spell out requirements upfront for:
          Availability (probability the system can perform when called upon),

          Reliability (probability of failure free operation in the mission interval),

          Maintainability (probability of being repaired in the allowed interval),

          Failure definitions (define critical, non-critical, and benign failures),

          Environmental usage (used in what environments and conditions,

          Lessons learned from previous projects (avoid past errors), and

          numerous other details and figures of merit.

 

Reliability goals often shift because of market conditions, what competitors are doing, new technology, sales prices, etc.  This requires knowing the benchmark and demonstrating flexibility to modify to meet shifting targets for new goals.  The measurements systems have these universal characteristics:

          Specific,

          Measurable,

          Agreed upon by the teams,

          Realistic but feasible, and

          Time specific for meeting project goals.

 

Experience shows three major categories for destroying inherent reliability of the system are (going from greatest to least with comparisons for good/bad systems):

                                                  Good Plants            Bad Plants

          People -                                ~38%                 up to ~80%

          Processes/Procedures -        ~34%                 up to ~70%

          Equipment -                          ~28%                 up to ~40%

Planning for error proofing is important for these critical factors which submarine the reliability of systems.  Critical factors are details which represent danger to human life, health, environment, loss risks of people/money/reputation, etc. which are often defined with risks matrix.  Too often engineers think the major reliability problems are with the equipment, however experience says humans are often the weakest link because of technique errors, errors made worse by lack of timely feedback (think Three Mile Island and Chernobyl nuclear reactor catastrophes), and errors because humans cannot sustain an indefinite high state of emergency attention.  Here are some typical methods of error proofing:

          Eliminate the error prone operations,

          Replace the human with nonhuman operations,

          Facilitate by assisting the human operator with simple tools or training,

          Detection of the error at the earliest stage such as with automation, and

          Mitigation reduces the serious damage by physical means.

Also the book Hostages To Each Other:  The Transformation Of Nuclear Safety Since Three Mile Island shows how the nuclear industry has attacked the human error problems in America’s nuclear reactor plants for generating electricity.

 

Audit the planning efforts to validate that the plans are effective and implemented.  Where deficiencies are discovered, shore up the weak areas and where practical, implement new goals and bring the correct skill sets into play to prevent future weaknesses.

 

Control-

Reliability control is a universal managerial process for conducting the operation so as to provide stability while preventing adverse changes, and maintain the status quo for failure free products and processes.

 

Control actions are required to achieve stable reliability at an expensive level or stable reliability at a less expensive level driven by the cost of unreliability.  Modify the Juran trilogy diagram for reliability in Figure 1.

Figure 1  The Juran Trilogy Diagram Modified For Reliability

 

We plan for reliability improvements.  We implement improvements to move from the high cost of unreliability to a lower zone of unreliability.  We accomplish better control of unreliability costs by implementing lessons learned and by use of failure data, failure modes, and the tools of reliability.  We enable the workforce to make numerous improvements by empowering the workforce to take individual actions for reliability improvements to reduce the high costs of unreliability as explained in “Reliability Programs:  Successful or Failures?”.

 

Reducing the cost of unreliability is better achieved by designing for reliability from the beginning of the project.  Unfortunately, we have too many systems and operating plants that were designed without thoughts about reliability improvements which results in the need to make reliability improvements described in Figure 1.

 

The least cost, highest return component is to improve our people.  By the  actions of our people we avoid the inadvertent high cost of the politically incorrect term of MTBSE.  Every engineer and every manger knows MTBSE but does not use the politically incorrect term.  Improving our people to let them control the processes is vitally important for controlling and reducing errors which crash our processes and plants.

 

The second most fertile area for improving control is by our processes and procedures which our people must follow rigorously to achieve control.  These are usually low cost improvements to aid our most fragile component—our people.

 

Of course most engineers do not have interest in the people component or the process/procedure component and they zero-in on the equipment.  So we cross wire the improvement effort from the start.  Concentrating primarily on the equipment usually guarantees high capital costs and delays in implementation whereas concentrating on the people, processes/procedures is both more cost conscious and more effective for achieving control which eliminates failures.

 

We control to move to new levels of performance by improvements. 

 

Improvement-

Reliability improvement means creating an organization dedicated to beneficial change for the purpose of achieving unprecedented levels of product and process performance often with emphasis on achieving a breakthrough to be devoid of failures to achieve the lowest long term cost of ownership.

 

Beneficial change is a control feature applicable to two major reliability improvements by:

          Product/process improvements to better satisfy customers/owners

          Freedom from deficiencies which generate failures and cost money

These techniques result in better reliability, improve satisfaction from customers/owners, and reduce costs from restoring the product/process to operable condition such as occurs with elimination of the early infant mortality issues. 

 

Beneficial changes requires discovery of the causes for failure.  Then remedies must be applied to prevent the failures and the costly waste of failures—in short: make improvements by better control.

 

Lessons learned libraries are important sources of details that must be controlled to prevent failures—see the NASA lessons learned library for an example.  Use of lessons learned libraries empower and enable the work force to make improvements—see how the British Navy ruled the waves by use of:

          Empowered teams -- management authorizes individual initiative and experience to be used continuously in an effective and time manner—in short individuals in the organization have authority to take action for corrective action, and

          Enabled teams -- employees are trained and drilled for proficiency using best practices that are continuously improved by feedback from the working teams for making improvements—in short, management must turn-on the organization for improvement action rather than disabling and denying positive action by individuals.

 

The rate of improvements must be measured and demonstrated with Crow-AMSAA reliability growth plots to insure improvements are demonstrated and demonstrated quickly to achieve competitive advantages in market.  These are show me, don’t tell me plots of our progress summarize the results of our improvements.  Many examples of reliability growth plots are shown in the technical paper “Predict Future Failures From Your Maintenance Records”.

 

Reliability Policy-

The motivation for a reliability policy statement is to reduce the high cost of unreliability.  This simple statement of intent for the organization is rarely made by management.  It provides the guiding light for galvanizing the organization toward improvement actions. 

 

The reliability policy (usually made in two short paragraphs) can be reduced to one simple concept as shown in this example:

 

We will build an economical and failure-free process that will operate for 5 years between planned turnarounds.

Management fails to make a reliability policy statement and then wonders why they incur the high cost of unreliability!  Say what you want and want what you say so the organization knows what to do.  Start with big-R, implement with little-r and reduce the cost of unreliability.

You can download a PDF file copy of this page

Return to Barringer & Associates, Inc. homepage

Revised September 2, 2013
© Barringer & Associates, Inc., 2008