Reliability
software solves other types of statistical problems with ease. Reliability practitioners will find the
techniques described below as useful refreshers for:
1. How to make probability plots to
get the basic statistics
2. Probability density function plots,
PDF’s
3. Reliability plots
4. Instantaneous failure rate plots (hazard
plots)
5. Probability plotting positions
Six
widely different problems concerning corrosion and tree diameters illustrate
how WinSMITH
Weibull software is a very useful tool for solving other problems using
statistical tools from the field of reliability.
2. Stress corrosion cracking (SCC) in
aluminum
3. Stress corrosion cracking in
stainless steel
Generally, scalar information is provided in the time or size domain such as 96, 30, 49, 82, 90 minutes until failure. So how can you make an X-Y plot?—this is always a big mystery for people new to probability plots.
Take the scalar data and put it into rank order—this will be the X-values. Then borrow a tool from the statisticians called rank order plotting position—this will be the Y-values.
Consider this dataset from the book by Dr. Robert B. Abernethy, The New Weibull Analysis Handbook,
4th edition, self published by Dr. Robert B. Abernethy,
|
Rank your data, calculate median rank plotting position, plot X-Y
data |
|||
|
Data From Abernethy, Table 2-2 |
|||
|
Rank Order, |
Age-To-Failure Time (minutes), |
Incomplete beta probability @50%, |
Benard’s
Median Rank Plotting Position (i-0.3)/(n+0.4), |
|
1 |
30 |
12.945% |
(1-0.3)/(5+0.4) = 12.963% |
|
Calculate the exact median rank using Excel function =BETAINV(0.5,i,n-i+1). Bernard’s median rank is accurate to ~0.5% of the true median rank and widely used for ease of calculation. |
|||
This table shows how to get X-Y pairs of plotting positions using scalar data and put it into a Weibull probability plot using Benard’s Median Rank for Figure 1.
Notice in Figure 1 the X-axis is a log scale in minutes, and the Y-axis is the log of another log so as to produce a straight trend line. The Y-axis is a probability scale describing the % of the population expected to fail by a specific time (i.e., this scale is unreliability and also know as the cumulative distribution function or CDF).
The coefficient of determination (a goodness of fit criteria) is 0.93 which says the straight line explains about 93% of the scatter in the data (the critical value is 80.64%) signifying a good fit of the straight line to the data. If you prefer, a better goodness of fit criteria is PVE%, (P-Value Estimate) = 52.65% where the critical value is 10%.
|
|
Return to the top of this problem.
Figure 1’s statistics of eta = 79.8 and beta = 2.165 drive the probability density function (PDF) shown in Figure 2.
|
|
The area under the PDF curve is unity and it shows the exact probability for failure at a specific time. The curve shape is important—it this case, it has a long tail to the right. Tailed data is typical in reliability analysis. Return to the top of this problem.
Reliability engineers are interested in the reliability (survivability) curve in Figure 3. It tells the percent of the population surviving until a specific time.
|
|
Return to the top of this problem.
Another important plot for reliability engineers is the hazard
curve. The hazard curve shows the
instantaneous failure rate in Figure 4.
|
|
Data and probability plotting positions drive probability plots which develop the statistics which in turn produce engineering graphics. It starts with data married to a plotting position. Return to the top of this problem.
Plotting Positions
Plotting positions are described in The New Weibull Handbook,
chapter 2, along with information about the historical progress from early use
of the mean plotting position towards today’s median plotting position often
using Bernard’s median rank equation for a good approximation. Dr. Abernethy points out: “Median rank
applies to all distributions, whereas the Gaussian plotting position, better
known as Blom’s plotting position, only
applies to the normal distribution”. Plotting
position use “i”, the ordered failure number ranked
first to last; and “n”, number of items in the sample. Some typical plotting positions are:
|
|
Simple |
(i/n) ßSimple
but not practical |
Generally the simple plot position, i/n, is used for simple explanations but seldom used in practice because the 100% position cannot be plotted on probability paper because the probability scale is endless. Likewise the simple plot position, (i-1)/n, produces a zero which also cannot be plotted on typical probability paper scales
Hazen’s midpoint plot position is again used for simple explanations but not frequently used in practice. It was originally claimed as minimizing errors but the conclusions were based on hand plots which were not very accurate.
Weibull and Gumbel used the mean plot position extensively when calculations were made by hand. With computers (and Leonard Johnson’s advice), Weibull shifted to the median rank plotting position. Today the median ranks plotting position is generally accepted as best practice for reducing errors and bias with tailed distributions. Read Benard’s median rank paper translated from Dutch into English.
The New Weibull Handbook,
chapter 5, shows results of Monte Carlo
simulations in Table 5-6 for both bias and mean square error about why median
ranks ((i-0.3)/(n+0.4)) plotting position is
best practice. Table 5-6 shows median
ranks is superior to mean ranks (i/(n+1)) and also
superior to Hazen’s midpoint ranks ((i-0.5)/n).
Additionally, Table 5-4 shows why plotting life data (and other data
with uncertainty in the time/size) should be regressed X-onto-Y rather than the
conventional method of regressing Y-onto-X for improved accuracy.
Other plotting details are given in C. R. Mischke, A Distribution-Independent Plotting Rule for
Ordered Failures, ASME document 79-DET-112
which describes:
|
|
|
Gaussian (pg 3)= |
(i-0.375)/(n+0.25) |
When a=b=0.3 gives Benard’s median rank plotting position.
When
a=b=0
gives the mean rank plotting
position.
When
a=0.375 and b=0.375 gives the Gaussian plotting position
When a=0.5 and b=0.5 gives Hazen’s plotting position
When
a=0 and b=1 gives the simple plotting position in which the largest
point,
unity, can’t be plotted on probability paper.
When a=1 and b=0 gives another simple plotting position in which the
smallest
point, zero, can’t be plotted on probability paper.
Return to the top of this problem.
Now some examples of data from other fields of study
Three different sources of information will be used for illustrating the use
of reliability software to solve data sets in the literature.
1.
Masamichi Kowaka, Introduction to
Life Prediction of Industrial Plant Materials: Application of the Extreme Value
Statistical Method for Corrosion Analysis,
2.
A summary
of
3. U.S. Department of Agriculture
Forest Service, Research
Paper SO-164: A Test of the Exponential Distribution for Stand Structure
Definition in Uneven-aged Loblolly-shortleaf Pine Stands, 1981. This report uses the exponential and double
exponential (Gumbel distribution).
These three documents were selected because of their availability and interesting examples. The analysis shown below is not intended to disparage any of the results in the literature, only to show how modern software can help enlighten the results.
Problem #1-Corrosion
Pit Depth-
Table 6.1 and 6.2 in Kowaka shows corrosion pit
depth data listed in mm of depth:
0.42,
0.52, 0.69, 0.60, 0.99, 0.34, 0.76, 0.73, 0.43, and 0.57.
Kowaka analyzed the data using normal probability
plots and a plotting position of i/(n+1). He concluded the data had a mean = 0.60 with
standard deviation = 0.18.
Using WinSMITH Weibull, Benard’s median rank plotting position, with rank regression curve fit techniques we get Figure 5 for the normal distribution. The mean = 0.605 and the standard deviation = 0.2079.
|
|
The curve fit in Figure 5 can use many different statistical distributions and simple criteria of (R2 – CCC2) to find the method with the largest positive value as the winning method to use—these options were not available to Kowaka.
I use the criteria involving R2 because every engineer understands the regression coefficient R. R2 is called the coefficient of determination which must be compared to the critical value CCC2. The value (R2 – CCC2) gives an absolute value for goodness of fit comparison although the newer value PVE is considered better criteria for goodness of fit but not so understandable for most engineers.
Here is a comparison for this data set from Figure 1 using
different distribution for a curve fit:
(R2 – CCC2) ßThis
is similar to freeboard on a ship
(it
must be positive or you take on water!)
PVE% ßCritical
value is 10% (bigger is better)
Gumbel+ 0.1292 99.9 ßThe
best fit distribution for the data
Lognormal 0.1135 99.9
Weibull 0.1092 73.88
Gumbel- 0.0372 20.04
The winning result for the Gumbel upper
distribution is shown in Figure 6:
|
|
Notice how the Gumbel upper distribution magnifies the data in the upper right hand corner of the probability plot as compared with the lower left hand corner. The upper right hand corner which is magnified is the area of concern as big pit depths are worrisome.
The plot if Figure 6 says 99% of the pits will be 1.308 mm or less in depth. Of course 1% of the pits will also be deeper than 1.308 mm. Small pit depths are not usually worrisome, and 5% of all pits will have a depth less than 0.325 mm! Figure 6 also shows 98% (99%-1%) of all pits are expected to lie between 0.251 mm and 1.308 mm
The PVE% goodness of fit shows a tie. How should we decide which distribution is best? The lognormal distribution predicts 99% of the pit depths will be less than 1.302 mm which is almost a wash compared to the Gumbel upper distribution value of 1.308 mm so we can’t make too much of the more pessimistic pit depth for the Gumbel upper distribution. Thus the tie breaker, in my judgment, would be the magnification available with the Gumbel upper distribution (the log normal distribution lacks magnification for larger values).
The PDF from Figure 6 is shown in Figure 7 with it’s long tail to the right toward deeper pits. This plot was made in WinSMITH Visual software directly from WinSMITH Weibull:
|
|
Return to the top of this problem.
Problem #2—Stress
Corrosion Cracking (SCC) Age-To-Failure For Aluminum Samples-
Table 6.5 in Kowaka shows hours to failure
for stress corrosion cracking of 27 samples of aluminum alloy in a 3% NaCl solution:
66, 70, 72, 73, 75, 75, 76, 77, 80, 80, 82, 82, 82, 88, 89, 90, 91, 91, 92, 92,
93, 93, 94, 94, 94, 95, 96.
Since the time keeping is course, notice stacks of data at 75, 80, 82, 91, 92,
93, and 94 hours. Kowaka
displayed the data on a log normal plot.
In fact the best distribution is a Weibull plot as shown in Figure 8 using a selection strategy based on (R2 – CCC2) as described above:
|
|
(R2 – CCC2) ßLarge positive values are desired
PVE% ßCritical
value is 10% (bigger is better)
Weibull 0.0301 30.16 ßThe best fit distribution for the data
Gumbel- 0.0191 20.19
Normal -0.0022 8.98
Lognormal -0.0071 7.05 ßKowaka chose this distribution
Gumbel+ -0.0819 0.451
Of course the advantage of the Weibull plot is the magnification of short ages
to failure on the probability plot which says to expect 1% of the
age-to-failure to occur at less than 58.31 hours. The Gumbel lower
distribution has the same Y-axis as a Weibull plot and the Gumbel
X-axis is uniformly divided with the expectation that 1% of the age-to-failure
to occur at less than 54.69 hours.
Therefore the Weibull plot has a better curve fit statistic and a more
pessimistic B1 value. The Weibull PDF
curve of the winning result is shown above in Figure 9 with the long tail
toward shorter times.
|
|
Return to the top of this problem.
Problem #3:—Stress Corrosion
Cracking (SCC) Age-To-Failure For SU 304 Stainless Steel In High-Temperature
Water-
Kowaka shows in Table 6.6 and Table 6.7
age-to-failure data for 24 samples in hours as:
28,
29, 32, 37, 39, 40, 40, 40, 41, 49, 53, 53, 54, 63, 64, 70, 73, 75, 82, 91, 97,
105, 122, 143 hours.
The distribution curve fits are shown below with
the Gumbel upper distribution resulting in the best
fit:
Here is a comparison for this data set from Figure 1 using different
distribution for a curve fit:
(R2 – CCC2) PVE%
Gumbel+ 0.064 89.43 ßThe
best fit and Kowaka’s choice
2-P Lognormal 0.045 72.66
3-P Weibull 0.0263 75.02
2-P Weibull -0.145 6.05
3-P Lognormal 0.0000 0.001
Gumbel- -0.150 0.044
The winning Gumbel upper distribution plot is shown
in Figure 10.
|
|
Figure 10 shows that 1% of the age-to-failure
will occur in less than 10.4 hours. The Gumbel upper distribution provides a less than desirable
fit in the lower time values which are the most troublesome data—we’ve got a good
curve fit in the upper reaches of the data but danger does not lie in
this zone. The Weibull plot magnifies
very clearly the lower reaches of the age-to-failure.
Therefore consider the 2-parameter Weibull
plot in Figure 11 which shows a concave downward shape to the data.
|
|
Chapter 3 of The New Weibull Handbook
describes four criteria that should always
be met before using a 3-parameter Weibull:
1. Data on a
2-parameter plot should show concavity curvature
(Visually we see concavity.)
2. We need a physical explanation of why
failures could not occur before
time t0. (Can the metallurgist
explain physical reasons for this?)
3. Need at least 21 failures in the data set.
(We have 24 failures.)
4. The R2 or PVE should improve
significantly with the 3-parameter
distribution. (We achieve an improve R2
in Figure 12.)
Assuming the 4 criteria are met, we find a failure free zone. Figure 12 suggests not to
expect failures in less than 25.88 hours IF
a physical reason exists for the failure free interval (If this criteria is not
met, then the Gumbel upper distribution prevails).
|
|
Figure 12 with the 3-parameter (IF a physical reason for the phenomena exists!) Weibull distribution says to expect no failures up to
time 25.88 hours and then 1% failures between 25.88 hours and 26.64 hours. The 3-parameter (IF a
physical reason for the phenomena exists!) lognormal distribution in Figure 13
gives a more conservative estimate of the shorter t0 interval and
says to expect 1% failures between 16.03 and 24.01 hours—the lognormal curve
has an adequate but not remarkable curve fit.
|
|
Kowata’s Table 6.7 shows reliability values for various
failure times. Similar information is
shown in Figure 14 for the Gumbel upper distribution.
Return to the top of this problem.
Problem #4:—Pit Depth Measurements In
Tank Bottom-
Pit depth measurements were taken for 20 sampling
areas. Each area was 300 mm * 300
mm. Samples were taken after 7 years of
service near the periphery of a circular tank bottom which had been
used to store heavy petroleum. The
maximum pit depth was recorded for each sample, however if the pit depth was
less than 0.5 mm, the record showed only <0.5 mm rather than the true value.
Kowaka’s Table 6.8 shows pit depth data for a
total of 20 samples:
11
samples < 0.5, 0.65, 0.71, 0.75, 0.84, 0.90, 1.07, 1.18, 1.25, and
1.82.
The data will be input into WinSMITH Weibull as a
frequency table using the attribute value 0.5*11 along with the remaining
variables data.
Convert the data set a Probit 2 data entry using the “Methods” icon which will fix the percentage values for the Y-axis. Finally the values of 0.5 will be deleted (as the other values on the Y-axis and X-axis will not change on the Probit probability plot). The value of this method is the deletion will leave the remaining variables data at the correct Y-axis value on the probability plot as shown in Figure 15.
|
|
We have the inspection results for a sample
of 20 pieces. The maximum number of
pieces that could exist in the sample zone is 473. This means we have taken a 20/473 = 4.23% sample. Given this is a representative sample, what
is the maximum depth we should expect to see had we inspected all 473 possible
pieces?
The question of maximum pit depth based on a
limited sample is answered by the “Return Period” as defined by Gumbel (see E. J. Gumbel, Statistics of Extremes, Columbia
University Press, NY,1958, ISBN 0-231-02190-9, page
215).
The traditional return period, as a
CDF value, is (N-1)/N. When N =473 the CDF is (473-1)/473 =
99.7886%. Figure 15 we get a maximum expected pit depth if all 473
samples were inspected = 2.92 mm.
This is shown in Figure 15. Kowaka’s value was 2.66 mm.
He got a smaller value by using a minimum variance linear estimator
method to fit the trend line rather than using today’s best practice for small
sample sizes of rank regression line fitting calculations. [Note: a better position on the CDF would be
from using Bernard’s median rank for n=473 and i=473
to generate a CDF=99.85213% which
would produce a pit depth of 3.63 mm]
Is the Gumbel upper
distribution the best distribution fit for the pit sample data?
(R2 – CCC2) PVE%
Weibull 0.029 55.76 ßThe
best fit distribution for (R2-CCC2)
Gumbel+
0.025 44.06 ßKowaka chose this distribution
Lognormal 0.021
64.90 ßThe best fit distribution for (PVE%)
Gumbel- -0.026 2.09
The maximum pit depth for the top three methods at CDF=99.7886:
Lognormal 3.39 ßThe worst case
scenario
Gumbel+ 2.92
ßKowaka calculated 2.66 mm for this distribution
Weibull 2.67 ßThe winner based on (R2 – CCC2)
The worst case scenario for the lognormal pit depth is shown in Figure 16:
|
|
Which distribution is correct? A man
with one watch always knows the time—a man with two…..! Intuitively I would believe the Gumbel upper distribution. However, prudence suggest that
the lognormal with a good fit (particularly for the PVE%)
and the worst case pit depth is the distribution I would bet on for a
conservative approach!
Return to the top of this problem.
Problem #5:—Tree
Diameters-
From the US Department of Agriculture, Forest Service Research Paper S0-164 the
raw data from tree diameters versus observed counts of uneven-aged forest
trees:
|
|
The
question is: What distribution best fits the data? The report shows the data best fitted a
doubly-truncated exponential diameter distribution (Gumbel
upper distribution).
A simple Weibull probability plot in Figure
17 shows the type of condition found in the field of reliability—course data
with big stacks of information at discrete intervals and the trend line
attempts to fit the mass of the data.
|
|
The trend line is a very poor fit because of
the huge mass of data (1840 data) for small tree diameters (4-inch). We will get a better regression line if we
only consider the top most data point in the stack of data for the regression. This is a common problem with coarse
inspection data for reliability.
Therefore use the inspection option found under the methods icon of WinSMITH
Weibull software. Converting to the
inspection option produces the results in Figure 18.
|
|
Using the inspection option, which
distribution is best for the data in Figure 18—the summary shows:
(R2 – CCC2) PVE%
Weibull 0.059 75.66 ßThe best fit
distribution for (R2-CCC2)
Normal 0.057 99.9 ßThe best fit
distribution by PVE%
Gumbel- 0.050 55.94
Gumbel+ 0.046 48.89
Lognormal -0.006
7.65
The statistics from Figure 18 are summarized
in the forecast shown in the table below along with a comparison of the two
curve fits from the report. In general,
the Weibull forecast shows less error than in report S0164.
|
|
Return to the top of this problem.
Problem #6:—Pipe Corrosion Data-
Guidelines for use of statistics for analysis of sample inspection of
corrosion, Research Report 016 provides pipe corrosion data for 16 samples out
of a potential of 1250 samples (based on area) in a failed pipe that was 25
meters long which was used as a sample.
Minimum wall thickness from ultrasonic internal diameter measurements were
recorded for each sample in mm:
3.33, 3.63, 3.8, 3.82, 3.82, 3.92,
3.98, 3.98, 4.04, 4.06, 4.1, 4.13, 4.19, 5.26, 4.34, 4.77
Research Report 016 used Hazen ranks (see remarks above) and maximum likelihood
techniques (useful for large data sets but the method is notorious for sporadic
results with small data sets).
Median ranks plotting positions and rank regression techniques are used in Figure 19 which shows the Weibull plot is the best distribution to use. The smallest wall thickness for the 1250 potential samples would occur on the trend line at i=1 and n=1250 which, by Benard’s median rank, would be at 0.05598%. At ~0.06% the trend line shows a wall thickness of 2.514 mm. With a 95% confidence limit applied to the left hand side, the wall thickness would be 2.178 mm. The Gumble upper distribution has the same curve fit but it is rejected because it magnifies the upper section of the curve which is not the area of concern.
|
|
Analysis of distribution results shows:
(R2 – CCC2) PVE%
Weibull 0.034 25.39 ßThe best fit
distribution for (R2-CCC2)
Gumble+ 0.034 25.39
Lognormal 0.028 25.71
Gumble- 0.011
13.62 ßThe fit selected by the
report using MLE
Using the maximum likelihood estimate method produces almost the same statistical numbers as obtained in Research Report 016 as shown in Figure 20. While the method is rigorous, it is inappropriate for small data samples. The trend line continues to fail the test of good engineering judgment for good fit of the trend line to the data points. For Figure 20 at 0.05598%, the trend line wall thickness is 1.648 mm and the left hand 95% confidence limit wall thickness is 0.874 mm.
|
|
A better method using MLE with reduced bias adjustment (RBA) for small data sets is available in WinSMITH Weibull as shown in Figure 21. The trend line continues to fail the test of good engineering judgment for good fit of the trend line to the data points. For Figure 21 at 0.05598%, the trend line wall thickness is 1.383 mm and the left hand 95% confidence limit wall thickness is 0.490 mm.
|
|
The bottom line for this pipe corrosion problem is to stick with the Weibull distribution results, shown in Figure 19, as the best conformance of the data to a practical model.
Return to the top of this problem.
Summary-
Some typical problems from the literature have been solved with reliability
software to show the ease of solution and the accurate results which can be
achieved with little effort.
The exercise does not have the intent of waiving a flag that the older solutions are worthless and the previous authors have done an inferior analysis. To the contrary, this analysis only offers a different view of using commonly available reliability tools to easily solve typical problems. The examples show some features of the typical reliability software that are handy for solving routine problems in other fields of endeavor.
Return to the list of problems by clicking here.
Refer to the caveats on the Problem
Of The Month Page about the limitations of
the solution above. Maybe you have a better idea on how to solve the problem.
Maybe you find where I've screwed-up the solution and you can point out my
errors as you check my calculations. E-mail your comments, criticism, and
corrections to: Paul Barringer by
clicking here. Return to the top of this
problem.
Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!
You can download a PDF copy of this Problem Of The Month by clicking here.
Return to Barringer
& Associates, Inc. homepage