Problem Of The Month
December 1997--Waste Disposal System Failures And Crow/AMSAA (Duane) Plots

In many chemical plants and refineries the failure of a waste disposal system causes a plant-wide failure. The integrity of the waste system is imperative and a key indicator for plant reliability. The concept is simple: no waste system--no plant!

No one prefers to spend money on waste systems, as they are perceived as a non-productive assets. We immediately loose sight of the simple concept: no waste system--no plant. One way to keep the waste system as a front page item is to predict when failures will occur and to price-out the costs of failures.

Reliability growth plots known as Crow/AMSAA plots are "show me don't tell me how the equipment is operating". From the Crow/AMSAA plots, you can predict the next failure and see changes in failure trends. Also the Crow/AMSAA plots are helpful for getting a sense for the turnaround period (the period for renewing the loss of entropy which occurs from wear and tear on the system).

WinSMITH Visual software can produce the Crow/AMSAA plots, and help with forecast of when the next failure will occur.

You can:
       1) Page down for this month's problem statement.
       2) Return to the list of monthly problems by clicking here.
       3) Bypass the background information and go directly to the problem statement by clicking here.

Background
A chemical plant must dispose of an aggressive waste product to sustain their chemical plant operations. The waste treatment plant must function to keep the production plant in operation. The current history shows numerous failures. Each failure incurs an average cost of US$150,000 in lost gross margin and repairs.

The production management team has a problem. They need to convince top level management that corrective improvements must be made to the waste disposal system to preserve the reliability of the plant production units.

The production management team needs compelling evidence that the waste system requires substantial maintenance work to keep the plant on stream. Unfortunately, the upper management group has the opinion that the production team is simply incurring chance failures and thus funds for major repairs will not be committed.

Return to the list of monthly problems by clicking here. Return to top of this problem statement by clicking here.

The Problem
A review of the failure records for the waste disposal system shows a variety of reasons for failure. The time between failures for the most recent data has been recorded in days: 22, 32, 1, 128, 206, 346, 1, 43, 4, 1, 22, 177, 8, 4, 10, 17, 17, 6, 11, 29, 7, 15, 21, 11, 35.  This data has been recorded since the last turnaround of the equipment.

Twenty-five failures have occurred over a period of 1174 days with an arithmetic average of almost 47 days per failure. However, if the four long run events (times between failures exceeding 100 days) are removed from the data, then the data shows an arithmetic average of 15 days per failure!!!

Production management is of the opinion they had things under control and now they've lost control. Top management thinks the four long runs were flukes and the system is one they must live with thus no extra expenditures are merited.

Questions:
1) Can we make a Weibull plot of the data?

2) What tool should we use to analyze the data?

3) What trends can we find in the data?

4) What compelling evidence do we present to top management to justify taking the waste treatment plant down for five days resulting in a loss of US$100,000 of gross margin and a turnaround cost of US$1,000,000 for correction of the problem?

5) Based on continuing trends how many additional failures should we expect at the end of a four year cumulative time period (1460 cumulative days)?

Return to the list of monthly problems by clicking here. Return to top of this problem statement by clicking here.

Solutions:

Answer to Question 1) Can we make a Weibull plot of the data?

No. We do not know the time origin for the real age to failure for components. We have mixed failure modes.

Answer to Question 2) What tools should we use to analyze the data?

The data set is incomplete. Thus we need a tool that can handle missing data and mixed failures modes--these are conditions that allow use of Crow/AMSAA plots to make sense from the apparent nonsense of the data.

Answer to Question 3) What trends can we find in the data?

Figure 1 shows the failure data in a Crow/AMSAA plot of cumulative failures versus cumulative time. The coefficient of determination, r^2, shows a not so good curve fit which is substantiated by visually comparing the overall trend line with the data points.

The plot shows some specific trends that are not captured by the overall trend line.

Consider the finer details outlined in Figure 2 that are obtained by pragmatically slicing and dicing the data into these data sets:

       Set 1: 22, 32, 1, 128, 206, 346, [In cumulative format (cum time, cum failures) the data for plotting becomes (22,1), (54,2), (55,3), (183,4), (389,5), (735,6) etc.]

       Set 2: 1, 43, 4, 1, 22,

       Set 3: 177,

       Set 4: 8, 4, 10, 17, 17, 6,

       Set 5: 11, 29, 7, 15, 21, 11, 35

This arbitrary slicing of the data says the production group should look for specific reasons for why cusps occurred in each set of data for the trends shown in Figure 2.

Data set 1 shows a decreasing failure rate (reliability is improving) up to the 7th failure which creates a cusp between set 1 and set 2. Data set 2 shows an increasing failure rate (reliability declines) up to the repair made following the 11th failure. Notice the line slope between failure 11 and failure 12 has returned to approximately the same favorable failure rate experienced during the early time period--the repair was an huge improvement but it lasted for 177 days! Set 4 and set 5 show increasing failure rates but the rate is slowing somewhat in set 5.

Based on extending the trendline of data set 1 from 735 days to 1174 days would have resulted in the following number of failures:
       Y=0.32X^0.4646 = 0.32*(735)^0.4646 = 6.86 forecasted failures versus the 6 actual failures or
       Y=0.32X^0.4646 = 0.32*(1174)^0.4646 = 8.54 forecasted failures compared to the 26 actual failures
This is 8.54 - 6.86 = 1.68 extra forecasted failures along the data set 1 trend line and 26 - 8.54 = 17.46 extra failures between the actual score card and the trend line of data set 1. The figures shows on the right hand side of Figure 2 are rounded numbers for simplicity.

In short, failure to restore the system to set 1 condition has resulted in ~17.3 failures more than expected over the same time period.

Answer to Question 4) What compelling evidence do we present to top management to justify taking the waste treatment plant down for five days resulting in a loss of US$100,000 of gross margin and a turnaround cost of US$1,000,000 for correction of the problem?

The waste treatment system seems to have a natural turnaround period of two years as shown by data set 1 in Figure 2.

Suppose a $1,000,000 turnaround expenditure had occurred with a $500,000 loss of gross margin, we could reasonably expected to have incurred ~1.5 more failures during the interval to where we have now incurred a total of 26 failures (or 18.5 failures in excess of expected value if we proceeded along data set 1's projection line

Option 1: Turnaround at year 2 to achieve an additional 2 year turnaround life
       Turnaround cost = US$1,000,000
       Lost gross margin during turnaround = US$500,000
       ~1.7 failures projected to reach 1174 cum hours = US$150,000*1.7 = US$255,000
       Total Costs = US$1,755,000 for results of today at 1174 cumulative days with additional life remaining in the system.

Option 2: Fix when broken
       Lost gross margin and maintenance cost for 26-7 = 19 failures = US$150,000*19 = US$2,850,000

5) Based on continuing trends how many additional failures should we expect at the end of a four year cumulative time period (1460 cumulative days)?

Extending the trend line of data set 5 to 1460 days we can forecast:
       y=(7.58E-08)X^2.78 = 7.58E-08*(1460)^2.78 = 47.49 failures at 1460 days - 26 failures now = 21 additional failures to reach from today to the four year cumulative time (prior to correction of set 5 trendline--the value was 19 failures).

Extending the trend line of data set 1 to 1460 years we can forecast:
       y=0.32X^0.4646 = 0.32*(1460)^0.4646 = 9.44 failures or 9.44 failures at 1460 days - 6.86 failures from the data set 1 trend line = 2.58 additional failures to reach from today to the four year cumulative time along the trend line of data set 1.

We can't afford Option 2!!!
You can explore this conclusion by graphically building a breakeven chart.

 Return to the list of problems by clicking here. Return to top of this problem statement clicking here.

Comments:

Duane AMSAA plots provide a graphical tool to show failure trends and forecast expected events. Based on the data for this waste treatment system it needs a two-year turnaround. Continuing with a fix when broken strategy is very expensive.

Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by     clicking here.

Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!

 

Thanks to Budana Prijadi of Freeport McMoran at the worlds largest goldmine in Irian Jaya (Indonesia) for finding a small error in the trend lines for Set 5: The incorrect trend line equation as previously shown as y=(1.86E-07)X^2.65 and the correct value is y=(7.58E-08)X^2.78

Last revised 1/20/03

Return to Barringer & Associates, Inc. homepage