Safety Integrity Levels (SIL) Standards Are
Appropriate For Many Reliability Issues

 

John Ruskin (circa 1850) said:
“It's unwise to pay too much, but it's worse to pay too little.  When you pay too much, you lose a little money - that's all.  When you pay too little, you sometimes lose everything, because the thing you bought was incapable of doing the thing it was bought to do.  The common law of business balance prohibits paying a little and getting a lot - it can't be done.  If you deal with the lowest bidder, it is well to add something for the risk you run, and if you do that you will have enough to pay for something better.”   

For control of risk the above concept boils down to the equation:
               $Risk = (probability of failure)*($Consequence of failure). 
The $Risk must be within your signature authority (or your team leaders written approval for their signature authority).  Given the $price of the failure consequence you can design the system to the written signature authority by designing the system to control the probability of failure as required to match the allowed signature authority.  In short, you’ve got to balance business risks.

The $Consequence represents the financial consequences of the potential disaster for: 
1.  People (Niccolo Machiavelli (1469-1527) in his book The Prince said it is more expensive for wounding a person than for death—never ending American newspaper lawsuit announcements confirms this.)
2.  Property (Plant sites and surrounding neighborhoods—remember the extent of Bhopal.)
3.  Environment (Remember the Macondo offshore platform disaster now approaching US$40 billion!)
4.  Business production and profits (You can’t deliver on time products and profits from failures!)
You must consider all four elements for the $Consequences as when the disaster dominoes start to fall they knock over everything in their way.  So you must think broadly rather than provincially.  You will never know these $numbers exactly so do what you were taught in engineering schools around the world—make an assumption and write it down for all to see…the world loves to be a critic of things with $ signs so you will get immediate feedback with some of it being very wise information.  Work for consensus $ numbers; but remember when the train jumps the track you will own the problem!!!

Knowing the value of ($Risk)/($Consequence) = (probability of failure) tells the maximum pof value you can accept.  The money issues on the left hand side of the equation drives the dimensionless probability of failure (pof) which sets the safety integrity level for probability of failure.

Marszal and Sharpf’s book Safety Integrity Level Selection remind in their Figure 3.1, page 38, of the risk management responsibilities for:
1.  Moral reason to make plants as safe as possible regarding costs to meet societies requirements.
2.  Legal reasons to comply with legal regulations as written regardless of costs or actual level of risk.
3.  Financial reasons to build lowest long term cost of ownership plants so as to keep operation budgets as small as possible.
The financial reasons are driven by the  longterm cost of ownership as determined by net present value, NPV, which is a life cycle cost decision driving safety Integrity levels (SIL).

The International Electrotechnical Commission (IEC), a Swiss standards organization, has published two standards directed toward risk reduction and control which are also available as ANSI standards: 
IEC 61508 Edition 2 pertains to functional safety which IEC defines as “freedom from unacceptable risk of physical injury or of damage to the health of people, either directly, or indirectly as a result of damage to property of to the environment”, in short this pertains to risk reduction and risk control.  This standard consists of 7 sections.  The standard specifically address electrical/electronic/programmable electronic safety-related systems, however the concept is applicable to all risk systems
               IEC 61508-0 defines functional safety
               IEC 61508-1 provides the general standard
               IEC 61508-2 provides the requirements
               IEC 61508-3 software requirements
               IEC 61508-4 definitions and abbreviations
               IEC 61508-5 examples of methods for determination of safety integrity levels
               IEC 61508-6 guidelines on application IEC 61508-2 and IEC 61508-3
               IEC 61508-7 overview of techniques and measures

IEC 61511 pertains to safety instrumented systems for the functional safety for the process industry with requirements for the specification, design, installation, operation, and maintenance of a safety instrumented system for process sector implementation of IEC 61508 noted above. 
               IEC 61511-SER contains both parts
               IEC 61511-1 Part 1: framework, definitions, system, hardware, and software requirements
                  IEC 61511-1 Corrigendum 1 for Part 1
               IEC 61511-2 Part 2: guidelines for the application of IEC 61511-1
               IEC 61511-3 Part 3: guidance for the determination of the required safety integrity levels
                  IEC 61511-3 Corrigendum 1 for Part 3

Remember the above standards are directed to specifically toward safety issues.  However the same concepts apply to all types of business issues in manufacturing plants.  The business issues require consideration of the tradeoffs between $consequences and probability of failure for finding the lowest long term cost of ownership. 

Keep in mind the first cost you pay for equipment or processes is not the last cost.  Typically supporting costs (which are more difficult to obtain) are 2-20 times higher than the acquisition costs (which usually have fixed procurement costs from price list or quotations).  You must also consider the time value of money to arrive at the lowest net present value.  This means we need to make tradeoff decisions for arriving at money driven probabilities of failure considering the $consequences.

Consider the case in Table 1 where we expect less than one catastrophe a year.  If tragedy happens we estimate costs at $20,000,000 for the event.  A single device to prevent the tragedy would cost $10,000 and we estimate it to have a reliability of 95% for the one year mission.  How many devices should we install in parallel to minimize our life cycle cost?  For this assume a 20 year project life, a 12% discount rate, and a 38% tax rate.  Download the life cycle cost spreadsheet, to aid in your solution.  Since this problem has no income, the issue will be to find the least negative net present value.  Remember that reliability + unreliability = 1 and unreliability = (probability of failure).  A parallel reliability model
Rsystem = 1-(1-R1)*(1-R2)*(1-R3)….  When reliability of each device is the same, reliability of the system is Rsystem = 1-(1-R)N where N is the number of devices in parallel.

The answer from Table 1 is 4 devices for a capital cost of $40,000.  Less than 4 devices is the high cost of making a bad risk decision because of under-design.  More than 4 devices is overkill and also results in a bad investment decision of over-design.  Four devices represent the sweet spot for this case.

Table 1: How Many Devices Should We Install For The Lowest Long Term Cost Of Ownership?


Table 2 shows the probability of failure on demand.  Of course this involves two distinct functions 1) the frequency of the failure occurrence and 2) the probability of failure upon demand.  Table 2 expects the frequency of failure occurrences will be small.  Practically speaking the frequency of occurrences is expected to be less than one occurrence per year and R = 1 – pof.

Safety Integrity Level (SIL)

Probability Of Failure On Demand Range

Risk Reduction Factor Range

Range Of Reliability Based On Demand

1

0.1≤ PDF <0.01

10 to 100

0.9 to 0.99

2

0.01≤ PDF <0.001

100 to 1,000

0.99 to 0.999

3

0.001≤ PDF <0.0001

1,000 to 10,000

0.999 to 0.9999

4

0.0001≤ PDF <0.00001

10,000 to 100,000

0.9999 to 0.99999

Table 2: Safety Integrity Levels Based On Demand Which Is An Agent For Probability Of Failure


Table 3 shows the failure rates based on continuous operation.  Practically speaking expectations for table 3 is one or more occurrences per year for continuous use per information from British Standard EN 61508 which has rounded numbers.

Safety Integrity Level (SIL)

Failure Rate Per Hour (l)
Range

MTTF (Q) (Years Per Failure)

Range

1 Year Mission Reliability Range

1

10-5 l > 10-6

11.42 ≤ Q < 114.16

0.9161 ≤ R < 0.9913

2

10-6 l > 10-7

114.16 ≤ Q < 1,141.55

0.9913 ≤ R < 0.9991

3

10-7 l > 10-8

1,141.55 ≤ Q < 11,415.53

0.9991 ≤ R < 0.9999

4

10-8 l > 10-9

11,415.53 ≤ Q < 114,155.25

  0.9999 ≤ R < 0.99999

Table 3: Safety Integrity Levels Based On Continuous Use Which Is An Agent For Failure Rates


Calculations for table 3 are straight forward commencing with the failure rate per hour.  Based on the use of the simple exponential reliability equation for chance failures which have a constant failure rate R = e-lt = e-t/Q where MTTF, Q = 1/l. Note the demands for continuous operation as slightly more rigorous for reliability than for demand.

SIL levels can be justified based on reliability issues and the lowest long term cost of ownership when the risks of failure are calculated and technical details converted to money issues.  In short, SIL become a business decision so you don’t spend too little money by taking too much risk or spend too much money and over design by taking too little risk.  To stay in business, you’ve got to get the money issues right.

Comments:

Refer to the caveats on the Problem Of The Month Page about the limitations of the above solution.  Maybe you have a better idea on how to solve the problem.  Maybe you will find that I’ve screwed-up the solution and you can point out my errors as you check my calculations. 
E-mail your comments, criticism, and corrections to
Paul Barringer. 

Download a PDF copy of this problem here.

Return to Barringer & Associates, Inc. homepage

July 23, 2013
© Barringer & Associates, Inc., 2013