Reliability software solves other types of statistical problems with ease.  Reliability practitioners will find the techniques described below as useful refreshers for:

1.     How to make probability plots to get the basic statistics

2.     Probability density function plots, PDF’s

3.     Reliability plots

4.     Instantaneous failure rate plots (hazard plots)

5.     Probability plotting positions

Six widely different problems concerning corrosion and tree diameters illustrate how WinSMITH Weibull software is a very useful tool for solving other problems using statistical tools from the field of reliability.

1.     Corrosion pit depths

2.     Stress corrosion cracking (SCC) in aluminum

3.     Stress corrosion cracking in stainless steel

4.     Tank bottom pit depths

5.     Tree diameters

6.     Pipe corrosion

Generally, scalar information is provided in the time or size domain such as 96, 30, 49, 82, 90 minutes until failure.  So how can you make an X-Y plot?—this is always a big mystery for people new to probability plots. 

Take the scalar data and put it into rank order—this will be the X-values.  Then borrow a tool from the statisticians called rank order plotting position—this will be the Y-values. 

Consider this dataset from the book by Dr. Robert B. Abernethy, The New Weibull Analysis Handbook, 4th edition, self published by Dr. Robert B. Abernethy, North Palm Beach, FL, 2002, ISBN 0-9653062-1-6, Chapter 2, Table 2-2:

Rank your data, calculate median rank plotting position, plot X-Y data

Data From Abernethy, Table 2-2

Rank Order,
i

Age-To-Failure Time (minutes),
X-axis

Incomplete beta probability @50%,
Y-axis exact Median Rank

Benard’s Median Rank Plotting Position (i-0.3)/(n+0.4),
Y-axis approximation

1
2
3
4
     5 = n

30
49
82
90
96

12.945%
31.381%
50.000%
68.619%
87.055%

(1-0.3)/(5+0.4) = 12.963%   
31.481%
50.000%
68.519%
87.037%

Calculate the exact median rank using Excel function =BETAINV(0.5,i,n-i+1).  Bernard’s median rank is accurate to ~0.5% of the true median rank and widely used for ease of calculation.

This table shows how to get X-Y pairs of plotting positions using scalar data and put it into a Weibull probability plot using Benard’s Median Rank for Figure 1. 

Notice in Figure 1 the X-axis is a log scale in minutes, and the Y-axis is the log of another log so as to produce a straight trend line.  The Y-axis is a probability scale describing the % of the population expected to fail by a specific time (i.e., this scale is unreliability and also know as the cumulative distribution function or CDF). 

The coefficient of determination (a goodness of fit criteria) is 0.93 which says the straight line explains about 93% of the scatter in the data (the critical value is 80.64%) signifying a good fit of the straight line to the data.  If you prefer, a better goodness of fit criteria is PVE%, (P-Value Estimate) = 52.65% where the critical value is 10%.

Return to the top of this problem.

Figure 1’s statistics of eta = 79.8 and beta = 2.165 drive the probability density function (PDF) shown in Figure 2.

The area under the PDF curve is unity and it shows the exact probability for failure at a specific time.  The curve shape is important—it this case, it has a long tail to the right.  Tailed data is typical in reliability analysis.  Return to the top of this problem.

Reliability engineers are interested in the reliability (survivability) curve in Figure 3.  It tells the percent of the population surviving until a specific time.

Return to the top of this problem.
Another important plot for reliability engineers is the hazard curve.  The hazard curve shows the instantaneous failure rate in Figure 4.

Data and probability plotting positions drive probability plots which develop the statistics which in turn produce engineering graphics.  It starts with data married to a plotting position.  Return to the top of this problem.

Plotting Positions

Plotting positions are described in The New Weibull Handbook, chapter 2, along with information about the historical progress from early use of the mean plotting position towards today’s median plotting position often using Bernard’s median rank equation for a good approximation.  Dr. Abernethy points out: “Median rank applies to all distributions, whereas the Gaussian plotting position, better known as Blom’s plotting position, only applies to the normal distribution”.  Plotting position use “i”, the ordered failure number ranked first to last; and “n”, number of items in the sample.  Some typical plotting positions are:

 

Simple  
Simple 
Midpoint (Hazen)
Mean
Mode
Median (Exact)
Median   

(i/n)         ßSimple but not practical
(i-1)/n      ßSimple but not practical
(i-0.5)/n
i/(n+1)
(i-1)/(n-1)
Find with Excel: =BETAINV(0.5,I,n-i+1)
(i-0.3)/(n+0.4)  ßApproximation good to ~0.5% of the exact median rank value

Generally the simple plot position, i/n, is used for simple explanations but seldom used in practice because the 100% position cannot be plotted on probability paper because the probability scale is endless.  Likewise the simple plot position, (i-1)/n, produces a zero which also cannot be plotted on typical probability paper scales 

Hazen’s midpoint plot position is again used for simple explanations but not frequently used in practice.  It was originally claimed as minimizing errors but the conclusions were based on hand plots which were not very accurate.

Weibull and Gumbel used the mean plot position extensively when calculations were made by hand.  With computers (and Leonard Johnson’s advice), Weibull shifted to the median rank plotting position.  Today the median ranks plotting position is generally accepted as best practice for reducing errors and bias with tailed distributions.  Read Benard’s median rank paper translated from Dutch into English. 

The New Weibull Handbook, chapter 5, shows results of  Monte Carlo simulations in Table 5-6 for both bias and mean square error about why median ranks ((i-0.3)/(n+0.4)) plotting position is best practice.  Table 5-6 shows median ranks is superior to mean ranks (i/(n+1)) and also superior to Hazen’s midpoint ranks ((i-0.5)/n).  Additionally, Table 5-4 shows why plotting life data (and other data with uncertainty in the time/size) should be regressed X-onto-Y rather than the conventional method of regressing Y-onto-X for improved accuracy.

Other plotting details are given in C. R. Mischke, A Distribution-Independent Plotting Rule for Ordered Failures, ASME document 79-DET-112 which describes:

 

 

Gaussian (pg 3)=
Blom (pg 3)=   

(i-0.375)/(n+0.25)
(i-a)/(n-a-b+1)

When a=b=0.3 gives Benard’s median rank plotting position.
            When a=b=0 gives the mean rank plotting position.
            When a=0.375 and b=0.375 gives the Gaussian plotting position
            When a=0.5 and b=0.5 gives Hazen’s plotting position
            When a=0 and b=1 gives the simple plotting position in which the largest
                        point, unity, can’t be plotted on probability paper.
            When a=1 and b=0 gives another simple plotting position in which the
                        smallest point, zero, can’t be plotted on probability paper.
Return to the top of this problem.


Now some examples of data from other fields of study

Three different sources of information will be used for illustrating the use of reliability software to solve data sets in the literature.
     1.  Masamichi Kowaka, Introduction to Life Prediction of Industrial Plant Materials: Application of the Extreme Value Statistical Method for Corrosion Analysis, Allerton Press, NY, 1994, ISBN 0-89864-073-3.  This is a USA English publication of a 1984 Japanese book originally published by The Japan Society of Corrosion Engineers.  Kowaka generally used the mean plotting position although for some data sets he used the Rankit method (which gives plotting positions very close to the median rank plotting position).
     2.  A summary of UK Health & Safety Executive report is available with corrosion details in research report 016 – Guidelines for use of statistics for analysis of sample inspection of corrosion.  This report uses the earlier mean plotting position, i/(n+1) as described in
     3. U.S. Department of Agriculture Forest Service, Research Paper SO-164: A Test of the Exponential Distribution for Stand Structure Definition in Uneven-aged Loblolly-shortleaf Pine Stands, 1981.  This report uses the exponential and double exponential (Gumbel distribution).

These three documents were selected because of their availability and interesting examples.  The analysis shown below is not intended to disparage any of the results in the literature, only to show how modern software can help enlighten the results.

Problem #1-Corrosion Pit Depth-
Table 6.1 and 6.2 in Kowaka shows corrosion pit depth data listed in mm of depth:
            0.42, 0.52, 0.69, 0.60, 0.99, 0.34, 0.76, 0.73, 0.43, and 0.57. 
Kowaka analyzed the data using normal probability plots and a plotting position of i/(n+1).  He concluded the data had a mean = 0.60 with standard deviation = 0.18. 

Using WinSMITH Weibull, Benard’s median rank plotting position, with rank regression curve fit techniques we get Figure 5 for the normal distribution.  The mean = 0.605 and the standard deviation = 0.2079.

The curve fit in Figure 5 can use many different statistical distributions and simple criteria of (R2 – CCC2) to find the method with the largest positive value as the winning method to use—these options were not available to Kowaka. 

I use the criteria involving R2 because every engineer understands the regression coefficient R.  R2 is called the coefficient of determination which must be compared to the critical value CCC2.  The value (R2 – CCC2) gives an absolute value for goodness of fit comparison although the newer value PVE is considered better criteria for goodness of fit but not so understandable for most engineers. 

Here is a comparison for this data set from Figure 1 using different distribution for a curve fit:
                        (R2 – CCC2)   ßThis is similar to freeboard on a ship
                                                (it must be positive or you take on water!)
                                                PVE% 
ßCritical value is 10% (bigger is better)
Gumbel+            
0.1292         99.9     ßThe best fit distribution for the data
Lognormal          0.1135         99.9
Weibull               0.1092         73.88
Normal                0.0895         66.22   
ßKowaka chose this distribution
Gumbel-              0.0372         20.04
The winning result for the Gumbel upper distribution is shown in Figure 6:

Notice how the Gumbel upper distribution magnifies the data in the upper right hand corner of the probability plot as compared with the lower left hand corner.  The upper right hand corner which is magnified is the area of concern as big pit depths are worrisome. 

The plot if Figure 6 says 99% of the pits will be 1.308 mm or less in depth.  Of course 1% of the pits will also be deeper than 1.308 mm.  Small pit depths are not usually worrisome, and 5% of all pits will have a depth less than 0.325 mm!  Figure 6 also shows 98% (99%-1%) of all pits are expected to lie between 0.251 mm and 1.308 mm 

The PVE% goodness of fit shows a tie.  How should we decide which distribution is best?  The lognormal distribution predicts 99% of the pit depths will be less than 1.302 mm which is almost a wash compared to the Gumbel upper distribution value of 1.308 mm so we can’t make too much of the more pessimistic pit depth for the Gumbel upper distribution.  Thus the tie breaker, in my judgment, would be the magnification available with the Gumbel upper distribution (the log normal distribution lacks magnification for larger values).

The PDF from Figure 6 is shown in Figure 7 with it’s long tail to the right toward deeper pits.  This plot was made in WinSMITH Visual software directly from WinSMITH Weibull:

Return to the top of this problem.

Problem #2—Stress Corrosion Cracking (SCC) Age-To-Failure For Aluminum Samples-
Table 6.5 in Kowaka shows hours to failure for stress corrosion cracking of 27 samples of aluminum alloy in a 3% NaCl solution:
66, 70, 72, 73, 75, 75, 76, 77, 80, 80, 82, 82, 82, 88, 89, 90, 91, 91, 92, 92, 93, 93, 94, 94, 94, 95, 96. 
Since the time keeping is course, notice stacks of data at 75, 80, 82, 91, 92, 93, and 94 hours.  Kowaka displayed the data on a log normal plot.

In fact the best distribution is a Weibull plot as shown in Figure 8 using a selection strategy based on (R2 – CCC2) as described above:

                        (R2 – CCC2)   ßLarge positive values are desired
                                                PVE% 
ßCritical value is 10% (bigger is better)
Weibull               
0.0301        30.16   ßThe best fit distribution for the data
Gumbel-               0.0191        20.19
Normal                -0.0022          8.98
Lognormal          -0.0071          7.05 
ßKowaka chose this distribution
Gumbel+            
-0.0819        0.451

Of course the advantage of the Weibull plot is the magnification of short ages to failure on the probability plot which says to expect 1% of the age-to-failure to occur at less than 58.31 hours.  The Gumbel lower distribution has the same Y-axis as a Weibull plot and the Gumbel X-axis is uniformly divided with the expectation that 1% of the age-to-failure to occur at less than 54.69 hours.  Therefore the Weibull plot has a better curve fit statistic and a more pessimistic B1 value.  The Weibull PDF curve of the winning result is shown above in Figure 9 with the long tail toward shorter times.

Return to the top of this problem.

Problem #3:—Stress Corrosion Cracking (SCC) Age-To-Failure For SU 304 Stainless Steel In High-Temperature Water-
Kowaka shows in Table 6.6 and Table 6.7 age-to-failure data for 24 samples in hours as:
            28, 29, 32, 37, 39, 40, 40, 40, 41, 49, 53, 53, 54, 63, 64, 70, 73, 75, 82, 91, 97, 105, 122, 143 hours.
The distribution curve fits are shown below with the Gumbel upper distribution resulting in the best fit:

Here is a comparison for this data set from Figure 1 using different distribution for a curve fit:
                        (R2 – CCC2)   PVE% 
Gumbel+             0.064           89.43   ßThe best fit and Kowaka’s choice
2-P Lognormal    0.045          72.66
3-P Weibull        0.0263         75.02
2-P Weibull        -0.145          6.05
3-P Lognormal   0.0000         0.001
Gumbel-              -0.150          0.044
The winning Gumbel upper distribution plot is shown in Figure 10.

Figure 10 shows that 1% of the age-to-failure will occur in less than 10.4 hours.  The Gumbel upper distribution provides a less than desirable fit in the lower time values which are the most troublesome data—we’ve got a good curve fit in the upper reaches of the data but danger does not lie in this zone.  The Weibull plot magnifies very clearly the lower reaches of the age-to-failure. 

Therefore consider the 2-parameter Weibull plot in Figure 11 which shows a concave downward shape to the data. 

Chapter 3 of The New Weibull Handbook describes four criteria that should always be met before using a 3-parameter Weibull:
            1.  Data on a 2-parameter plot should show concavity curvature
                 (Visually we see concavity.)
            2.  We need a physical explanation of why failures could not occur before
                 time t0. (Can the metallurgist explain physical reasons for this?)
            3.  Need at least 21 failures in the data set. (We have 24 failures.)
            4.  The R2 or PVE should improve significantly with the 3-parameter
                 distribution. (We achieve an improve R2 in Figure 12.)
Assuming the 4 criteria are met, we find a failure free zone.  Figure 12 suggests not to expect failures in less than 25.88 hours
IF a physical reason exists for the failure free interval (If this criteria is not met, then the Gumbel upper distribution prevails).

Figure 12 with the 3-parameter (IF a physical reason for the phenomena exists!) Weibull distribution says to expect no failures up to time 25.88 hours and then 1% failures between 25.88 hours and 26.64 hours.  The 3-parameter (IF a physical reason for the phenomena exists!) lognormal distribution in Figure 13 gives a more conservative estimate of the shorter t0 interval and says to expect 1% failures between 16.03 and 24.01 hours—the lognormal curve has an adequate but not remarkable curve fit.

Kowata’s Table 6.7 shows reliability values for various failure times.  Similar information is shown in Figure 14 for the Gumbel upper distribution.

Return to the top of this problem.

Problem #4:—Pit Depth Measurements In Tank Bottom-

Pit depth measurements were taken for 20 sampling areas.  Each area was 300 mm * 300 mm.  Samples were taken after 7 years of service near the periphery of a circular tank bottom which had been used to store heavy petroleum.  The maximum pit depth was recorded for each sample, however if the pit depth was less than 0.5 mm, the record showed only <0.5 mm rather than the true value.

Kowaka’s Table 6.8 shows pit depth data for a total of 20 samples:
            11 samples < 0.5, 0.65, 0.71, 0.75, 0.84, 0.90, 1.07, 1.18, 1.25, and 1.82. 
The data will be input into WinSMITH Weibull as a frequency table using the attribute value 0.5*11 along with the remaining variables data. 

Convert the data set a Probit 2 data entry using the “Methods” icon which will fix the percentage values for the Y-axis.  Finally the values of 0.5 will be deleted (as the other values on the Y-axis and X-axis will not change on the Probit probability plot).  The value of this method is the deletion will leave the remaining variables data at the correct Y-axis value on the probability plot as shown in Figure 15.

We have the inspection results for a sample of 20 pieces.  The maximum number of pieces that could exist in the sample zone is 473.  This means we have taken a 20/473 = 4.23% sample.  Given this is a representative sample, what is the maximum depth we should expect to see had we inspected all 473 possible pieces? 

The question of maximum pit depth based on a limited sample is answered by the “Return Period” as defined by Gumbel (see E. J. Gumbel, Statistics of Extremes, Columbia University Press, NY,1958, ISBN 0-231-02190-9, page 215). 

The traditional return period, as a CDF value, is (N-1)/N.  When N =473 the CDF is (473-1)/473 = 99.7886%.  Figure 15 we get a maximum expected pit depth if all 473 samples were inspected = 2.92 mm.  This is shown in Figure 15.  Kowaka’s value was 2.66 mm.  He got a smaller value by using a minimum variance linear estimator method to fit the trend line rather than using today’s best practice for small sample sizes of rank regression line fitting calculations.  [Note: a better position on the CDF would be from using Bernard’s median rank for n=473 and i=473 to generate a CDF=99.85213% which would produce a pit depth of 3.63 mm] 

Is the Gumbel upper distribution the best distribution fit for the pit sample data?
                        (R2 – CCC2)   PVE% 
Weibull                0.029          55.76  ßThe best fit distribution for (R2-CCC2)
Gumbel+              0.025          44.06  ßKowaka chose this distribution
Lognormal           0.021         
64.90  ßThe best fit distribution for (PVE%)
Normal                -0.009            4.26
Gumbel-              -0.026            2.09

The maximum pit depth for the top three methods at CDF=99.7886:
Lognormal           3.39  ßThe worst case scenario   
Gumbel+              2.92  ßKowaka calculated 2.66 mm for this distribution          
Weibull                2.67  ßThe winner based on (R2 – CCC2)
The worst case scenario for the lognormal pit depth is shown in Figure 16:

Which distribution is correct?  A man with one watch always knows the time—a man with two…..!  Intuitively I would believe the Gumbel upper distribution. However, prudence suggest that the lognormal with a good fit (particularly for the PVE%) and the worst case pit depth is the distribution I would bet on for a conservative approach!
Return to the top of this problem.

Problem #5:—Tree Diameters-
From the US Department of Agriculture, Forest Service Research Paper S0-164 the raw data from tree diameters versus observed counts of uneven-aged forest trees:

 The question is: What distribution best fits the data?  The report shows the data best fitted a doubly-truncated exponential diameter distribution (Gumbel upper distribution).

A simple Weibull probability plot in Figure 17 shows the type of condition found in the field of reliability—course data with big stacks of information at discrete intervals and the trend line attempts to fit the mass of the data.

The trend line is a very poor fit because of the huge mass of data (1840 data) for small tree diameters (4-inch).  We will get a better regression line if we only consider the top most data point in the stack of data for the regression.  This is a common problem with coarse inspection data for reliability.  Therefore use the inspection option found under the methods icon of WinSMITH Weibull software.  Converting to the inspection option produces the results in Figure 18.

Using the inspection option, which distribution is best for the data in Figure 18—the summary shows:

                        (R2 – CCC2)   PVE% 
Weibull                0.059          75.66  ßThe best fit distribution for (R2-CCC2)
Normal                 0.057          99.9    ßThe best fit distribution by PVE%
Gumbel-               0.050          55.94  
Gumbel+              0.046          48.89
Lognormal          -0.006            7.65

The statistics from Figure 18 are summarized in the forecast shown in the table below along with a comparison of the two curve fits from the report.  In general, the Weibull forecast shows less error than in report S0164.

Return to the top of this problem.


Problem #6:—Pipe Corrosion Data-
Guidelines for use of statistics for analysis of sample inspection of corrosion, Research Report 016 provides pipe corrosion data for 16 samples out of a potential of 1250 samples (based on area) in a failed pipe that was 25 meters long which was used as a sample. 

Minimum wall thickness from ultrasonic internal diameter measurements were recorded for each sample in mm:
     3.33, 3.63, 3.8, 3.82, 3.82, 3.92, 3.98, 3.98, 4.04, 4.06, 4.1, 4.13, 4.19, 5.26, 4.34, 4.77
Research Report 016 used Hazen ranks (see remarks above) and maximum likelihood techniques (useful for large data sets but the method is notorious for sporadic results with small data sets). 

Median ranks plotting positions and rank regression techniques are used in Figure 19 which shows the Weibull plot is the best distribution to use.  The smallest wall thickness for the 1250 potential samples would occur on the trend line at i=1 and n=1250 which, by Benard’s median rank, would be at 0.05598%.  At ~0.06% the trend line shows a wall thickness of 2.514 mm.  With a 95% confidence limit applied to the left hand side, the wall thickness would be 2.178 mm.  The Gumble upper distribution has the same curve fit but it is rejected because it magnifies the upper section of the curve which is not the area of concern.

Analysis of distribution results shows:
                        (R2 – CCC2)   PVE% 
Weibull                0.034          25.39  ßThe best fit distribution for (R2-CCC2)
Gumble+              0.034          25.39
Lognormal           0.028          25.71  
Normal                 0.026          24.07
Gumble-               0.011          13.62  ßThe fit selected by the report using MLE

Using the maximum likelihood estimate method produces almost the same statistical numbers as obtained in Research Report 016 as shown in Figure 20.  While the method is rigorous, it is inappropriate for small data samples.  The trend line continues to fail the test of good engineering judgment for good fit of the trend line to the data points. For Figure 20 at 0.05598%, the trend line wall thickness is 1.648 mm and the left hand 95% confidence limit wall thickness is 0.874 mm.

A better method using MLE with reduced bias adjustment (RBA) for small data sets is available in WinSMITH Weibull as shown in Figure 21.  The trend line continues to fail the test of good engineering judgment for good fit of the trend line to the data points. For Figure 21 at 0.05598%, the trend line wall thickness is 1.383 mm and the left hand 95% confidence limit wall thickness is 0.490 mm.

The bottom line for this pipe corrosion problem is to stick with the Weibull distribution results, shown in Figure 19, as the best conformance of the data to a practical model.

Return to the top of this problem.

Summary-
Some typical problems from the literature have been solved with reliability software to show the ease of solution and the accurate results which can be achieved with little effort.

The exercise does not have the intent of waiving a flag that the older solutions are worthless and the previous authors have done an inferior analysis.  To the contrary, this analysis only offers a different view of using commonly available reliability tools to easily solve typical problems.  The examples show some features of the typical reliability software that are handy for solving routine problems in other fields of endeavor.

Return to the list of problems by clicking here.

Refer to the caveats on the Problem Of The Month Page about the limitations of the solution above. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by     clicking here.   Return to the top of this problem.

Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!

You can download a PDF copy of this Problem Of The Month by clicking here.

Return to Barringer & Associates, Inc. homepage

Last revised 6/2/2004
© Barringer & Associates, Inc. 2004