Problem Of The
Month
December
2003---Make Repairs On Overtime?
For redundant equipment should I
make repairs on straight time to defer costs?
Should I shorten the repair time by use of overtime with the extra
cost? This is often an emotional
question and the facts below will help you made the decision based on the
merits of the case rather than the emotion.
Download this problem of the month as a PDF file.
The Problem:
We have two devices working in
parallel configuration on a continuously operating system. Both devices have accumulated the same number
of operating hours, and we need one to survive.
We have just recorded a failure on
one device. Now, we no longer have a
redundant system.
Should we repair the dead device on
straight time or overtime? How do we make a rational decision based on the
facts? What facts do we need to make the
computation?
Background Information:
We will keep the problem simple
using the exponential distribution of failure where the chance for failure is
constant.
Here are the facts for the simple
case:
1.
Assume
we have perfect switching between devices.
2.
Each
device has a mean time to failure of 3
years which is the same as 26,280 hours/failure, and thus the failure rate is
1/26280 = 0.000038052 failures/hour.
3.
The
failure has occurred at 20,000 hours.
4.
The time
to make repairs is 20 hours. The delay
time for corrective maintenance on overtime is 10 hours and without overtime the
delay is 400 hours. This means the unit
will be restored to the up condition in 30 hours for overtime and 420 hours for
non-overtime conditions.
5.
The
repair cost is $1000 for non-overtime and $2000 for overtime.
6.
If the
surviving operating unit fails during the repair/delay period, the costs for
lost gross margin lost is $30,000.
7.
The risk
equation is $Risk = POF*$Consequence where POF = probability of failure and
$Consequence = $30,000.
Calculations For The Simple Case:
Step 1: The probability for
surviving to 20,000 hours is:
R(t) =
= e-0.000038052*20000
= e-0.76104 = 0.46718.
This says the probability the device will survive 20,000 hours is 46.718%.
Step 2: The probability for surviving
to 20,030 hours under conditions of overtime is:
R(t) =
= e-0.000038052*20030
= e-0.76218156 = 0.46665.
This says the probability the device will survive 20,030 hours is 46.665%.
Step 3: The probability of
failure between 20,000 hours and 20,030 hours under overtime repairs is:
POF20,000
to 20,030 = 0.46718 – 0.46665 = 0.00053
This says the probability of failure of the surviving device during the
overtime repair period is 0.053%
Step 4: Given the redundant device has survived to
20,000, what is the conditional probability of failure during the next 30
hours.
POFGiven
the unit has survived 20,000 hours what are chances for failure during the next
30 hours = (0.46718 – 0.46665)/0.46718 = 0.00053/0.46718 = 0.00113
This says given the device has survived to age 20,000 hours, the conditional probability of failure of the
surviving device during the next 30 hours of overtime repair period is 0.113%
Step 5: How much money is at risk for failure during
the 30 hour overtime period
$Risk
= 0.00113*$30,000 = $33.90
Step 6: The probability of
failure at 20,420 hours under the normal repair interval is:
R(t) =
= e-0.000038052*20420
= e-0.77702184 = 0.45977
This says the probability the device will survive 20,420 hours is 45.977%.
Step 7: The probability of
failure between 20,000 hours and 20,420 hours under the normal repair interval
is:
POF20,000
to 20,030 = 0.46718 – 0.45977 = 0.00741
This says the probability of failure of the surviving device during the normal
repair period is 0.741% which is much greater chance for failure than if the
repair is completed quickly with overtime.
Step 8: Given the redundant
device has survived to 20,000, the conditional probability of failure during
the next 420 hours for normal repairs is:
POFGiven
the unit has survived 20,000 what are chances for failure during the next 420
hours = (0.46718 – 0.45977)/0.46718 = 0.00741/0.46718 = 0.01586
This says given the device has survived to age 20,000 hours, the conditional probability of failure of the
surviving device during the next 420 hours of normal repair period is 1.586%
Step 9: How much money is at risk for failure during
the 30 hour overtime period? It’s
important to convert the issues to money using the $Risk =
POF*$Consequence.
The $Risk
exposure during the 30 hour overtime repair (see step 4) is 0.00113*$30,000 =
$33.90.
The $Risk for routine repairs during the 420 hour interval (see step 8) is
0.01586*$30,000 = $475.80.
Step 10: Would you spend $1,000 extra for overtime
repair to reduce your risk $475.80 - $33.90 = $441.90? The answer is no. By the way, the numbers can turn out to be
much different if the normal repair times are too long (for example, if elapsed
repair times [repair + delay times] for this example exceed 1020 hours, then
overtime repairs are beneficial).
By the way, if you repeat these
calculations in Excel, you will get slightly different numbers than with these
hand calculations as a result of the use of more significant digits than shown
above in the rounded versions which were worked out with a hand calculator to
show the steps for finding the values.
The bottom line:
You’ve got to know how things live
and die along with the costs for repairs and the consequences if two units fail
at the same time. You can use this
step-by-step example to make it more accurate (but also more complicated) by
introducing Weibull analysis and more complicated cost profiles. Make your decisions rationally rather than
emotionally—this requires the facts.
Comments:
Refer to the caveats on the Problem
Of The Month Page about the limitations of the following solution.
Maybe you have a better idea on how to solve the problem. Maybe you find where
I've screwed-up the solution and you can point out my errors as you check my
calculations. E-mail your comments, criticism, and corrections to: Paul
Barringer by
clicking
here. Return
to top of page.
Thanks to El Hadi, Development Manager,