Important decisions in life and business are:
Know when to accept the risk.
Know when to reject the risk.
The math for risk is easy to express: $Risk = (probability of failure)*($Consequence).
The hard part of risk is:
1. Understanding how much risk ($Risk) you can tolerate.
2. Deciding the probability of failure.
3. Evaluating the $Consequence.
These same elements are included in risk-based inspection (RBI) decisions described by publications from both API and ASME.
Insurance Issue To Illustrate Some Ideas:
In our private lives, we set the limit on how much risk ($Risk) we can tolerate. You do this when selecting the deductible limits on automobile insurance. For some people the amount we have to pay out of our pocket for an accident is US$100, while others with more financial ability may accept US$5000 as the limit for the little example in Table 1.
The financial consequence for automobile insurance is determined by the
calamity of an accident. If the
consequences involve the loss of human life the price, in the litigious
Cost for the loss of multiple lives accelerates at a rapid rate which in
turn infers the allowed probability for loss of multiple lives must be
substantially less to control the $Risk and society sets limits for some
probability of failure. Societal risks
are explained in tables/figures in Dr. Ernest J. Henley’s book, Probablistic Risk Assessment and Management for Engineers
and Scientists, 2nd edition along with guidance in
The automobile insurance example says we can only tolerate the probability of failure in the range of 0.000003 to 0.001 as shown in the table below. However, don’t be enamored with the math. Common sense must prevail because society sets limits on what is an acceptable probability of failure where human life is involved (i.e., the probability of failure cannot exceed 0.0001). Furthermore practical abilities set the limit on the experience of a large number of drives for the accident rates stored in history files of insurance companies.
From this little example, you can see some people/businesses will be risk adverse or risk accepting. Also each person/business will have different values for the probability of failure. Furthermore society demands a cap on the probability of failure for some events thus the math model must be used with good judgment.
Watch out for the lure of making risk-based decisions a mathematics problem. Start with and use common sense. Begin with a qualitative approach. A top-down fault tree mentality can speed your analysis. You’ll be amazed at what you can learn and how effective simplicity can be in resolving issues. In the second pass, grow the issues to a semi-qualitative analysis. Finally, (if required) perform a quantitative analysis.
ASME International resources:
A good first book for risk-based methodology is available from the ASME Bookstore with the title Risk-Based Methods For Equipment Life Management: An Application Handbook. (2003) (price US$175) This book is from the Center for Research and Technology Development, Research Committee on Risk Technology, CRTD-Vol. 41 which includes a CD with spreadsheet templates for risk analysis, fault trees, etc. The Handbook supplements:
1991 Risk-Based Inspection-Development Of Guidelines: General Document: Volume 1 (1991), Volume 2 (1993), Volume 3 (1995), Volume 2-Part 2 (1997) or the set of 4 references as IX97S2 which “…describes and recommends appropriate processes and methods using risk-based information to establish inspection guidelines for facilities or structural systems.” A
1995 Risk and Safety Assessment: Where Is the Balance? (1995) (price US$150) “…the 54 papers in this volume discuss the following topics: Risk-Based Decision Making: Risk Assessment Methods and Approaches; Applications of Risk-Based Methods; Risk Reduction Strategies…”
Risk-Based Inservice Testing Volume 1 and Volume 2 (price US$60) which introduces risk-based processes centered on three major areas:
1. “The identification of high-risk and low-risk impact operating equipment using risk-ranking techniques that take into account system functions, component performance and equipment service conditions.”
2. “The identification of equipment or component-specific failure-cause processes, to develop highly effective test strategies, in terms of both safety and economics, for determining precursors to failure that are applicable to the high-risk impact components.”
3. “The identification of test strategies that adequately assess the performance of low-risk impact equipment”
Risk-Based API resources:
Another reference that will tell you voluminous details about risk-based inspection methodology is API Publ 581 Base Resource Document—Risk-Based Inspection. To find the abstract, go to http://www.api.org/publications and search for Publ 581 (price US$644 for hardcopy or PDF). The API abstract for Publ 581 sets three major goals for the RBI program:
1. “Provide the capability to define and quantify the risk of process equipment failure, creating an effective tool for managing many of the important elements of a process plant.”
2. “Allow management to review safety, environmental, and business-interruption risks in an integrated, cost-effective manner.”
3. “Systematically reduce the likelihood and consequence of failure by allocating inspection resources to high risk equipment.”
API’s RBI also owner/users to establish inspection plans as an alternative to the traditional:
API 510 Pressure Vessel Inspection Code: Maintenance Inspection, Rating, Repair, and Alteration, 8th edition (price US$107 for hardcopy or PDF) of the 33 page document, or
API 570 Piping Inspection Code: Inspection, Repair, Alteration, and Rerating of In-Service Piping Systems, 2nd edition (price US$93 for hardcopy or PDF) of the 38 page document.
You can order these API documents from Global Engineering Documents website.
The Risk Matrix:
Both ASME and API risk-based techniques generally use a 5*5 risk matrix as shown in Figure 1.
Why only 5 zones of consequence and only 5 zones of failure probability? Experience has shown this is a practical range for many issues. More zones and you spend much time splitting hairs. Fewer zones and you have many arguments over the appropriate categories. Be pragmatic about the number of zones required as no perfect answer exists.
If qualitative numbers are used in Figure 1 they represent the likelihood of occurrences. The maximum score of 25 is obtained by multiplying the score of probability times the severity; and the least score is 1. The verbal descriptions and the numeric descriptions respond to end users who either think by the numbers or think by the simple language describing the events. The scoring systems are not unlike failure mode and effects analysis (FMEA) rating systems.
Risk increases along the diagonal from the lower left hand corner to the upper right hand corner. Each color generally represents zones of roughly equal amounts of $Risk. Again, use judgment in setting the color bands to conform to your special situations. The colors and hence the $Risks are not always fixed.
The upper right hand corner red zone demands special consideration and special attention. Expect this zone to represent 10% to 20% of the issues with 60% to 80% of the money. Tend to this hot zone very carefully! Address these issues with the Pareto principle: Separate the vital few problems from the trivial many problems.
The very high ratings can also represent severe safety issues, severe environmental issues, and/or major political issues.
The lower left hand corner contains the most issues, by nose count, but little money exposure. Work on these low level issues in the white zone only when time permits.
The orange zone gets second level attention, followed by the yellow zone.
The matrix starts out with a simple qualitative risk matrix in Figure 1 as the beginning level—this illustrates the concept and is very useful for the initial pre-screening of risk. The qualitative matrix is useful for screening purposes using words most people understand when common sense is applied by knowledgeable individuals.
Moving up the food chain one notch uses a more complicated semi-quantitative risk matrix in Figure 2 as the intermediate level. This level requires more questions and more analysis to avoid overly conservative risk ranking which may occur with the qualitative risk matrix of Figure 1. Of course this takes more time and consumes more costs for the semi- quantitative risk matrix analysis.
What’s the advantage of the semi-quantitative risk matrix? You start to make your numbers speak with an understandable voice. The numbers provide greater depth of understanding. You can apply the semi-quantitative values to either the probability of failure or to the consequence of the failures—it doesn’t matter which you do first.
Are the numbers always the same from one company to another and from one business to another business? Usually they are not the same. Each location has different goals, different performance standards, and different money restrictions which cause the numbers to be different.
Figure 2 helps prioritize effort and provides guidelines for taking action as well as not taking action on some problems. The wide applicability of the risk matrix has been successful in design, maintenance, and operational decisions.
Too often problems are described in “near hysterical terms” of the major calamity that could or might occur. The calamity is described without out regard to the probability of the occurrence. When the two terms (probability and consequence) are brought together, the facts often reduce the hysteria to a more moderate perspective. The mergers of semi-quantitative information is an important considerations particularly for potential “whistle-blowers” to make sure the suspected situation is worthy of the clamor to be generated when the facts are presented.
the matrix grows to the third level for a quantitative
risk matrix with specific failure probability scales for the vital few
items that are worthy of the increased attention and costs. This requires establishing criteria for
probability of failure and the cost consequences for the matrix and for the
cost consequences as shown in Figure 3.
Remember the scales can be changed to fit the risk environment. For example, a risk accepting organization may have large failure probabilities whereas a risk adverse organization may not allow high probabilities of failures. Each organization must select the risk they can tolerate and use that as a guide.
Each organization must know the $Risk level that can be tolerated for each organization level. The $Risk must be spelled out in monetary terms so engineers and managers can work to the numbers rather than over/under reacting as they try to guess what will be tolerated. This issue is addressed in the technical paper presented to the API Pipeline Conference: Reliability Issues From A Management Perspective.
Working through the logic of a quantitative risk matrix requires teamwork to assess situations and to use the wisdom of knowledgeable people working for the common good of the organization. The teamwork effort avoids the extremes of too pessimistic or too optimistic views typically expressed by individuals but the team effort seems to knock off the sharp corners as issues are debated and logic prevails over emotion.
Most problems are not appropriate for the quantitative risk matrix! This approach is for the vital few issues and this is driven by the cost to perform the analysis and the time consumed in the effort.
Another example of a risk matrix is described in Figure 4 based on details from MIL-STD-882 Rev D.
Finally, keep in mind the need to communicate clearly to the workforce the desirable actions required to prevent failure.
Many different examples of a risk matrix have been presented. Some simple, some complex, some risk accepting, and some risk adverse. No one version fits all cases. Be flexible and keep in mind you must sell the program to not be too risk adverse and not to be too risk accepting. You must fit the pattern to the specific case.
Return to the list of problems by clicking here.
Refer to the caveats on the Problem Of The Month Page about the limitations of the solution above. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by clicking here. Return to the top of this problem.
You can download a PDF copy of this Problem Of The Month by clicking here.