Problem Of The Month
February 1998-Coefficient Of Variation

For normal (Gaussian) distributions, the coefficient of variation measures the relative scatter in data with respect to the mean.  The Cv is a term often used for six-sigma activities to express relative variations.  It is also useful for calculating losses in process reliability problems when used with the Weibull distribution.

  • What is the coefficient of variation (Cv) for the Weibull distribution?        
  • How does the Cv relate to the Weibull shape factors identified as b?        
  • How does the non-symmetrical Weibull Cv relate to the symmetrical 6-sigma concepts used in normal distributions? 
  • What are some typical values for the coefficient of variation for process reliability?

You can:
       1) Page down for this month's problem statement.
       2) Return to the list of monthly problems by clicking here.
       3) Bypass the background information and go directly to the problem statement
           by clicking here.

Background
The coefficient of variation provides a relative measure of data dispersion compared to the mean: Cv=s/X-bar for the normal (bell shaped) distribution. The coefficient of variation has no units. It may be reported as a simple decimal value or it may be reported as a percentage as mentioned in the January '98 Problem Of The Month.

When the Cv is small, the data scatter compared to the mean is small. When the Cv is large compared to the mean, the amount of variation is large. For example, the variation in the amount of pocket change compared to the average amount of money you have in your wallet may be small to medium when measured by the coefficient of variation. However, the variation in the amount of your pocket change compared to the average US national debt is insignificant and thus the coefficient of variation would be minute.

Scatter in the bell-shaped normal curve will include six sigma = ±3s = 6s = 99.73% of the area under the curve.  A 6-sigma representation will cover 99.73% of the expected data occurrences considering the range of -3s at CDF = 0.135% and + 3s at CDF = 99.865%. Small standard deviations usually indicate data are closely clustered about the arithmetic mean. Large standard deviations usually indicate data are spread-out and widely dispersed.

We will use some general ideas about the traditional figures of 99.73% of the data to look for "most of the data under the curve" even with the non-symmetrical Weibull distribution. Remember the skewed shape of most Weibull curves represent reality but lack of symmetry cause problems with the explanations.  For our Weibull six sigma’s I’ll adopt the convention of the area between 0.1% and 99.9% which equals 99.8% of the data.

The coefficient of variation for the Weibull distribution depends upon the shape factor, beta. This is a very handy feature for use with straight line Weibull probability plots. Thus the slope of the line determined by beta tells you the coefficient of variation—large beta values give small variations whereas small betas give large variations.

You can:
       1) Page down for this month's problem statement.
       2) Return to the list of monthly problems by clicking here.


The Problem
What is the coefficient of variation for the Weibull distribution?

The New Weibull Handbook in Appendix G-2 defines:
the Weibull standard deviation as

   s = h*{G(1+2/b) - [G(1+1/b))]^2}^0.5, ß this says s = h*( a relationship with b) and

the Weibull mean

   m = h*{G(1+1/b)}, ß this says m = h*( another relationship with b)

When the coefficient of variation (Cv) terms for s/m are collected, the characteristic value h drops out. Then

   Cv = s/m = {G(1+2/b) - [G(1+1/b)]^2}^0.5 / G(1+1/b), ß this says Cv = a function of b.

For Cv , the complicated gamma functions in numerator and denominator are all written in terms of b.

If you evaluate Cv = s/m using Excel, you would have this expression (assuming b is located in cell A1):

=((EXP(GAMMALN((2+A1)/A1))-(EXP(GAMMALN((1+A1)/A1)))^2)^0.5/EXP(GAMMALN((1+A1)/A1)))

{Test case: if b = 4.9, Cv = 0.233317}

The complicated Excel equation can be simplified for a fairly good fit with the line segments shown in the graph below:

Hence a specified coefficient of variation will have a fixed slope on the Weibull plot. So, when a line is drawn by WinSMITH Weibull the amount of variation is decided and identified. Steep beta’s (i.e., large values for beta) have small variations in the data (and thus the Cv is small).  Shallow beta's (i.e., small values for beta) have large variations in the data (and thus the Cv is large).  For most issues, you want steep betas with small Cv. 

A few other tips to think about: 
            1) Knowing the Gaussian normal Cv does not imply that you can find the Weibull Cv or the Weibull b values. 
            2) A Weibull probability plot of normal data produces a reasonably good straight line plot.
            3) A normal probability plot of Weibull data often does not produce a very good straight line.
            4) A Weibull probability plots tell you details about the distribution rather than you specifying only a
                single (normal) distribution.
            5) Using normal distributions for manufacturing data is often wrong as the data contains drift effects and
                other biases which may be larger than the normal errors.
            6) Perhaps Weibull control charts are more robust than standard SPC charts for the reason given in 5) above.
            7) The Cv is a more interesting concept for Statisticians but not it is not a particularly helpful value for use
                with Engineers.

How do the coefficients of variation relate to the Weibull shape factors identified as b?

Mathcad was used to evaluate the Weibull coefficient of variation at specific values by solving for beta. The results are shown in the following table.

Weibull Beta Values For Weibull Coefficient Of Variations = s/m

CV

b

CV

b

CV

b

Cv

b

2.0
1.5
1.0
0.9

0.5427
0.6848
1.0000
1.1128

0.5
0.2
0.15
0.10

2.1014
5.7974
7.9069
12.154

0.05
0.04
0.03
0.02

24.951
31.357
42.040
63.413

0.01
0.009
0.008
0.007

127.54
141.79
159.61
182.53

Suppose you made a Weibull plot. The plot showed beta (the shape factor) = 10 and eta (the scale factor which in reliability terms would be the characteristic life) = 1000. You would find the coefficient of variation = 0.1203 or 12%. You would also find the Weibull standard deviation = 114.457 and the Weibull mean = 951.3508.

This is mathematically correct even though few people use the Weibull mean.

How do you connect the non-symmetrical Weibull coefficient of variation related to the symmetrical normal distribution six sigma concepts?

The normal distribution has a bell shaped curve. Weibull curves are not usually bell shaped. Thus Weibull six sigma concepts will not be as neat and tidy as for the normal distribution.

Using the example above, three Weibull standard deviations either side of the Weibull mean is 951.3508 ± 3*114.457 which results in 607.98 and 1294.7.  Using Excel to evaluate the Weibull F(t) = (1-e^(t/h)^b) for t = 1294.7 is 99.99982149% and F(t) for 607.98 is 0.68768286984%---so this covers the range of 99.31213862% of the data for this non-symmetrical distribution.  Of course this is not quite the same range as you would expect for the normal equation of ±3*s = 99.73% which is symmetrical.

Note the right hand tail of the Weibull probability density function contains 0.000178507% of the data while the left hand tail of the Weibull distribution contains 0.68768286984% of the data (or roughly 3853 times more data in the right hand tail) for this particular set of b and h.  Right away, you see the Weibull curve is going to complicate the concept of explaining and computing what's a six sigma value for the Weibull distribution.  So consider the following rules of thumb to avoid mind numbing complex issues.

Consider this simple, rule of thumb approximation for ±3*s = 99.73% = 6*s.  Take the value at 99.9% occurrence (1213.2) and the compliment which occurs at 0.1% (501.2) and thus 99.9% - 0.1% = 99.8% (which is close to 99.73%) of the data is expected to lie between 1213.2 and 501.2 which corresponds closely to the 6s concept of 99.73% of the data lies under the normal curve.

Earlier I had proposed another rule of thumb simplification.  Seldom is the mean, m, used in the Weibull distribution as the “best central tendency” instead the mathematical value of the characteristic value h is used.   The h simplification was OK for large betas (what you desire for process reliability issues) but not OK for small betas (less than 1).  I’m withdrawing this h simplification as it causes too many useless arguments!  I will stick with the traditional definition Cv = s/m = (Weibull standard deviation) / (Weibull mean) as used in many 6-sigma studies.  You can see the variability in the table shown below.

Weibull Beta Values For Weibull Coefficient Of Variations = s/h

CV

b

CV

b

CV

b

Cv

b

2.0
1.5
1.0
0.9

0.6752
0.7788
1.0000
1.0798

0.5
0.2
0.15
0.10

1.8426
5.3020
7.3825
11.606

0.05
0.04
0.03
0.02

24.385
30.789
41.469
62.847

0.01
0.009
0.008
0.007

126.966
141.212
159.033
181.956

Weibull Beta Values For Weibull Coefficient Of Variations = s/m

CV

b

CV

b

CV

b

Cv

b

2.0
1.5
1.0
0.9

0.5427
0.6848
1.0000
1.1128

0.5
0.2
0.15
0.10

2.1014
5.7974
7.9069
12.154

0.05
0.04
0.03
0.02

24.951
31.357
42.040
63.413

0.01
0.009
0.008
0.007

127.54
141.79
159.61
182.53

For process reliability problems the value of b = 100 seems to be a practical value for world class performance and this produces Cv = 0.0127 or 1.27% of variability around the Weibull mean.

An Example:
What are some typical values for the coefficient of variation based on
Cv = s/m = (Weibull standard deviation) / (Weibull mean)?

Since the coefficient of variation is a relative measure, the absolute values depend upon the situation. Consider the example of money in your pocket.  Assume the characteristic value is m = US$100, notice how the coefficient of variation will show the scatter in your pocket money at CDF = 99.9% and CDF = 0.1% for ~6*s range.  The way you would calculate the Low/High values is based on {m/{G(1+1/b)}*{ln[1/(1-CDF)]}^(1/b) and for the low end you would use 0.001 for the CDF, likewise for the upper end you would use 0.999 for the CDF.  Note that m = h*{G(1+1/b)}, and you will recognize the first term in the brackets as h.

Typical Values For The Coefficient Of Variation For Process Variation

Plain English
Statement

Example of CV = s/m %
and Beta Values

99.8% ~6*s Range Of
Money
in Your Pocket if
m = $100

Poor control

Cv = 20% --> b = 5.797

For b = 5.797 and m = US$100, then compute h = $107.9979
Low = $32.81 to High = $150.73, Range = $117.92 (considerable variability)

Fair control

Cv = 10% --> b = 12.153

For b = 12.153 and m = US$100, then compute h = $104.3039
Low = $59.08 to High = $122.28, Range = $63.20

Tight control

Cv = 5% --> b = 24.949

For b = 24.949 and m = US$100, then compute h = $102.20798
Low = $77.49 to High = $110.44, Range = $32.95

Excellent control

Cv = 2.5% --> b = 50.586

For b = 50.586 and m = US$100, then compute h = $101.115397
Low = $88.21 to High = $105.05, Range = $16.84

World class

Cv = 1.25% --> b = 101.880

For b = 101.880 and m = US$100, then compute h = $100.56024
Low = $93.97 to High = $102.49, Range = $8.52

Seldom achieved

Cv = 0.625% --> b = 204.480

For b = 204.480 and m = US$100, then compute h = $100.2807
Low = $96.95 to High = $101.23, Range = $4.28 (not much variability)

Large coefficients of variation say you’ll have big variations in the amount of money in your pockets.  Small coefficients of variation say you will know within a very small range how much money will be in your pocket.  Of course having lots of money in your pocket may be desirable but practical limits exist as to the maximum amount of money you will have in you own pockets—the same case exists for production output.

The coefficient of variation will be used to set the nameplate rating for production processes in future Problems Of The Month for March '98.

Other pages you may want to visit concerning similar issue are:

·       Production Output/Problems

·       Six Sigma

·       Nameplate Capacity

·       Production Reliability Example With Nameplate Ratings

·       Key Performance Indicators From Weibull Production Plots

·       Process Reliability Plots With Flat Line Slopes

·       Process Reliability Line Segments

·       Papers On Process Reliability As PDF Files For No-charge Downloads

Return to the list of problems by clicking here. Return to top of this problem statement clicking here.

Comments:

Refer to the caveats on the Problem Of The Month Page about the limitations of the following solution. Maybe you have a better idea on how to solve the problem. Maybe you find where I've screwed-up the solution and you can point out my errors as you check my calculations. E-mail your comments, criticism, and corrections to: Paul Barringer by     clicking here.

Technical tools are only interesting toys for engineers until results are converted into a business solution involving money and time. Complete your analysis with a bottom line which converts $'s and time so you have answers that will interest your management team!

Thanks to John Hawkins of PPG Industries for catching a typo that changed the incorrect phrase “…non-symmetrical Weibull Cv to symmetrical 6-sigma…” to the correct phrase “…non-symmetrical Weibull Cv relate to the symmetrical 6-sigma…” and the need to add a closing “)” to the Cv equations copied to Excel---I’ve also added a test case for each equation so you can validate your results. HPB 8/13/99.  The approximation for Cv = s/h has been withdrawn on 11/1/02.

Last revised 4/20/2004
© Barringer & Associates, Inc. 1999

Return to Barringer & Associates, Inc. homepage